2021-09-06 12:38:40

by Naresh Kamboju

[permalink] [raw]
Subject: ipv4/tcp.c:4234:1: error: the frame size of 1152 bytes is larger than 1024 bytes [-Werror=frame-larger-than=]

[Please ignore this email if it is already reported ]
Following build warnings/ errors noticed while building linux mainline
master branch
with gcc-11 for arm architecture with provided config file.

make --silent --keep-going --jobs=8
O=/home/tuxbuild/.cache/tuxmake/builds/current ARCH=arm
CROSS_COMPILE=arm-linux-gnueabihf- 'CC=sccache
arm-linux-gnueabihf-gcc' 'HOSTCC=sccache gcc' olddefconfig
make --silent --keep-going --jobs=8
O=/home/tuxbuild/.cache/tuxmake/builds/current ARCH=arm
CROSS_COMPILE=arm-linux-gnueabihf- 'CC=sccache
arm-linux-gnueabihf-gcc' 'HOSTCC=sccache gcc'
/builds/linux/arch/arm/boot/dts/bcm2711-rpi-4-b.dts:220.10-231.4:
Warning (pci_device_reg): /scb/pcie@7d500000/pci@1,0: PCI unit address
format error, expected 0,0
/builds/linux/arch/arm/boot/dts/bcm2711-rpi-4-b.dts:220.10-231.4:
Warning (pci_device_reg): /scb/pcie@7d500000/pci@1,0: PCI unit address
format error, expected 0,0
/builds/linux/net/ipv4/tcp.c: In function 'do_tcp_getsockopt.constprop':
/builds/linux/net/ipv4/tcp.c:4234:1: error: the frame size of 1152
bytes is larger than 1024 bytes [-Werror=frame-larger-than=]
4234 | }
| ^
cc1: all warnings being treated as errors
make[3]: *** [/builds/linux/scripts/Makefile.build:277: net/ipv4/tcp.o] Error 1
/builds/linux/fs/select.c: In function 'do_sys_poll':
/builds/linux/fs/select.c:1041:1: error: the frame size of 1264 bytes
is larger than 1024 bytes [-Werror=frame-larger-than=]
1041 | }
| ^
cc1: all warnings being treated as errors
make[2]: *** [/builds/linux/scripts/Makefile.build:277: fs/select.o] Error 1
make[3]: Target '__build' not remade because of errors.
make[2]: *** [/builds/linux/scripts/Makefile.build:540: net/ipv4] Error 2
make[2]: Target '__build' not remade because of errors.
make[1]: *** [/builds/linux/Makefile:1872: fs] Error 2
/builds/linux/net/mac80211/mlme.c: In function 'ieee80211_sta_rx_queued_mgmt':
/builds/linux/net/mac80211/mlme.c:4345:1: error: the frame size of
1040 bytes is larger than 1024 bytes [-Werror=frame-larger-than=]
4345 | }
| ^
cc1: all warnings being treated as errors
make[3]: *** [/builds/linux/scripts/Makefile.build:277:
net/mac80211/mlme.o] Error 1
make[3]: Target '__build' not remade because of errors.
make[2]: *** [/builds/linux/scripts/Makefile.build:540: net/mac80211] Error 2
make[2]: Target '__build' not remade because of errors.
make[1]: *** [/builds/linux/Makefile:1872: net] Error 2
/builds/linux/drivers/base/test/property-entry-test.c: In function
'pe_test_uint_arrays':
/builds/linux/drivers/base/test/property-entry-test.c:250:1: error:
the frame size of 1040 bytes is larger than 1024 bytes
[-Werror=frame-larger-than=]
250 | }
| ^
cc1: all warnings being treated as errors
make[4]: *** [/builds/linux/scripts/Makefile.build:277:
drivers/base/test/property-entry-test.o] Error 1
make[4]: Target '__build' not remade because of errors.
make[3]: *** [/builds/linux/scripts/Makefile.build:540:
drivers/base/test] Error 2
make[3]: Target '__build' not remade because of errors.
make[2]: *** [/builds/linux/scripts/Makefile.build:540: drivers/base] Error 2
/builds/linux/drivers/usb/host/xhci.c: In function 'xhci_reserve_bandwidth':
/builds/linux/drivers/usb/host/xhci.c:2867:1: error: the frame size of
1064 bytes is larger than 1024 bytes [-Werror=frame-larger-than=]
2867 | }
| ^
cc1: all warnings being treated as errors
make[4]: *** [/builds/linux/scripts/Makefile.build:277:
drivers/usb/host/xhci.o] Error 1
make[4]: Target '__build' not remade because of errors.
make[3]: *** [/builds/linux/scripts/Makefile.build:540:
drivers/usb/host] Error 2
/builds/linux/drivers/net/ethernet/qlogic/qede/qede_filter.c: In
function 'qede_config_rx_mode':
/builds/linux/drivers/net/ethernet/qlogic/qede/qede_filter.c:1278:1:
error: the frame size of 1072 bytes is larger than 1024 bytes
[-Werror=frame-larger-than=]
1278 | }
| ^
cc1: all warnings being treated as errors
make[6]: *** [/builds/linux/scripts/Makefile.build:277:
drivers/net/ethernet/qlogic/qede/qede_filter.o] Error 1
make[6]: Target '__build' not remade because of errors.
make[5]: *** [/builds/linux/scripts/Makefile.build:540:
drivers/net/ethernet/qlogic/qede] Error 2
make[5]: Target '__build' not remade because of errors.
make[4]: *** [/builds/linux/scripts/Makefile.build:540:
drivers/net/ethernet/qlogic] Error 2
make[3]: Target '__build' not remade because of errors.
make[2]: *** [/builds/linux/scripts/Makefile.build:540: drivers/usb] Error 2
make[4]: Target '__build' not remade because of errors.
make[3]: *** [/builds/linux/scripts/Makefile.build:540:
drivers/net/ethernet] Error 2
make[3]: Target '__build' not remade because of errors.
make[2]: *** [/builds/linux/scripts/Makefile.build:540: drivers/net] Error 2
/builds/linux/drivers/media/dvb-frontends/mxl5xx.c: In function 'config_ts':
/builds/linux/drivers/media/dvb-frontends/mxl5xx.c:1575:1: error: the
frame size of 1232 bytes is larger than 1024 bytes
[-Werror=frame-larger-than=]
1575 | }
| ^
cc1: all warnings being treated as errors
make[4]: *** [/builds/linux/scripts/Makefile.build:277:
drivers/media/dvb-frontends/mxl5xx.o] Error 1
make[4]: Target '__build' not remade because of errors.
make[3]: *** [/builds/linux/scripts/Makefile.build:540:
drivers/media/dvb-frontends] Error 2
make[3]: Target '__build' not remade because of errors.
make[2]: *** [/builds/linux/scripts/Makefile.build:540: drivers/media] Error 2
make[2]: Target '__build' not remade because of errors.
make[1]: *** [/builds/linux/Makefile:1872: drivers] Error 2
make[1]: Target '__all' not remade because of errors.
make: *** [Makefile:219: __sub-make] Error 2
make: Target '__all' not remade because of errors.

Build config:
https://builds.tuxbuild.com/1xjZpYj47LrrGs1OE1xApznPplW/config

Reported-by: Linux Kernel Functional Testing <[email protected]>

meta data:
-----------
git_describe: v5.14-9687-g27151f177827
git_ref:
git_repo: https://gitlab.com/Linaro/lkft/mirrors/torvalds/linux-mainline
git_sha: 27151f177827d478508e756c7657273261aaf8a9
git_short_log: 27151f177827 (\Merge tag
'perf-tools-for-v5.15-2021-09-04' of
git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux\)
kconfig: [
defconfig
https://raw.githubusercontent.com/Linaro/meta-lkft/sumo/recipes-kernel/linux/files/lkft.config
https://raw.githubusercontent.com/Linaro/meta-lkft/sumo/recipes-kernel/linux/files/lkft-crypto.config
https://raw.githubusercontent.com/Linaro/meta-lkft/sumo/recipes-kernel/linux/files/distro-overrides.config
https://raw.githubusercontent.com/Linaro/meta-lkft/sumo/recipes-kernel/linux/files/systemd.config
https://raw.githubusercontent.com/Linaro/meta-lkft/sumo/recipes-kernel/linux/files/virtio.config
CONFIG_ARM_TI_CPUFREQ=y
CONFIG_SERIAL_8250_OMAP=y
CONFIG_POSIX_MQUEUE=y
CONFIG_OF=y
CONFIG_KASAN=y
CONFIG_KUNIT=y
CONFIG_KUNIT_ALL_TESTS=y
],
kernel_version: 5.14.0
target_arch: arm
toolchain: gcc-11

steps to reproduce:
https://builds.tuxbuild.com/1xjZpYj47LrrGs1OE1xApznPplW/tuxmake_reproducer.sh

--
Linaro LKFT
https://lkft.linaro.org


2021-09-07 22:17:30

by Linus Torvalds

[permalink] [raw]
Subject: Re: ipv4/tcp.c:4234:1: error: the frame size of 1152 bytes is larger than 1024 bytes [-Werror=frame-larger-than=]

On Mon, Sep 6, 2021 at 5:27 AM Naresh Kamboju <[email protected]> wrote:
>
> /builds/linux/net/ipv4/tcp.c: In function 'do_tcp_getsockopt.constprop':
> /builds/linux/net/ipv4/tcp.c:4234:1: error: the frame size of 1152

None of these seem to be new.

It looks like this is (yet another) example of some build testing that
just never cared about warnings.

Which was the exact problem that the new -Werror flag is all about.
All these build test servers that only go "ok, it built, never mind
all the warnings".

Everybody tells me that the build servers care about warnings.
Everything I actually see tells a very different story indeed.

Linus

2021-09-07 23:37:20

by Nathan Chancellor

[permalink] [raw]
Subject: Re: ipv4/tcp.c:4234:1: error: the frame size of 1152 bytes is larger than 1024 bytes [-Werror=frame-larger-than=]

On 9/7/2021 4:14 PM, Linus Torvalds wrote:
> There are many more of these cases. I've seen Hyper-V allocate 'struct
> cpumask' on the stack, which is once again an absolute no-no that
> people have apparently just ignored the warning for. When you have
> NR_CPUS being the maximum of 8k, those bits add up, and a single
> cpumask is 1kB in size. Which is why you should never do that on
> stack, and instead use '
>
> cpumask_var_t mask;
> alloc_cpumask_var(&mask,..)
>
> which will do a much more reasonable job. But the reason I call out
> hyperv is that as far as I know, hyperv itself doesn't actually
> support 8192 CPU's. So all that apic noise with 'struct cpumask' that
> uses 1kB of data when NR_CPUS is set to 8192 is just wasted. Maybe I'm
> wrong. Adding hyperv people to the cc too.

I am only commenting on this because I was looking into an instance of
this warning yesterday with Fedora's arm64 config, which has
CONFIG_NR_CPUS=4096:

https://lore.kernel.org/r/[email protected]/

Won't your example only fix the issue with CONFIG_CPUMASK_OFFSTACK=y or
am I misreading the gigantic comment in include/linux/cpumask.h? As far
as I can tell, only x86 selects it and it is not user configurable
unless CONFIG_DEBUG_PER_CPU_MAPS is set, which is buried in the debug
options so most people won't bother trying to enable it. If I understand
correctly, how should these be dealt with in the case of
CONFIG_CPUMASK_OFFSTACK=n?

Cheers,
Nathan

2021-09-07 23:53:42

by Linus Torvalds

[permalink] [raw]
Subject: Re: ipv4/tcp.c:4234:1: error: the frame size of 1152 bytes is larger than 1024 bytes [-Werror=frame-larger-than=]

[ Added maintainers for various bits and pieces, since I spent the
time trying to look at why those bits and pieces wasted stack-space
and caused problems ]

On Tue, Sep 7, 2021 at 3:16 PM Linus Torvalds
<[email protected]> wrote:
>
> None of these seem to be new.

That said, all but one of them seem to be (a) very much worth fixing
and (b) easy to fix.

The do_tcp_getsockopt() one in tpc.c is a classic case of "lots of
different case statements, many of them with their own struct
allocations on stack, and all of them disjoint".

So the fix for that would be the traditional one of just making helper
functions for the cases that aren't entirely trivial. We've done that
before, and it not only fixes stack usage problems, it results in code
that is easier to read, and generally actually performs better too
(exactly because it avoids sparse stacks and extra D$ use)

The pe_test_uints() one is the same horrendous nasty Kunit pattern
that I fixed in commit 4b93c544e90e ("thunderbolt: test: split up test
cases in tb_test_credit_alloc_all") that had an even worse case.

The KUNIT macros create all these individually reasonably small
initialized structures on stack, and when you have more than a small
handful of them the KUNIT infrastructure just makes the stack space
explode. Sometimes the compiler will be able to re-use the stack
slots, but it seems to be an iffy proposition to depend on it - it
seems to be a combination of luck and various config options.

I detest code that exists for debugging or for testing, and that
violates fundamental rules and causes more problems in the process.

The mac802.11 one seems to be due to 'struct ieee802_11_elems' being
big, and allocated on the stack. I think it's probably made worse
there with inlining, ie

- ieee80211_sta_rx_queued_mgmt() has one copy

- ieee80211_rx_mgmt_beacon() is possibly inlined, and has its own copy

but even if it isn't due to that kind of duplication due to inlining,
that code is dangerous. Exactly because it has two nested stack frames
with that big structure, and they are active at the same time in the
callchain whether inlined or not.

And it's *pointlessly* dangerous, because the 'struct ieee802_11_elems
elems' in ieee80211_sta_rx_queued_mgmt() is only used for the
IEEE80211_STYPE_ACTION case, so it is entirely disjoint from the
IEEE80211_STYPE_BEACON case, and those stack allocations simply should
not nest like that in the first place.

Making the IEEE80211_STYPE_ACTION case be its own function - like the
other cases - and moving the struct there should fix it. Possibly a
"noinline" or two necessary to make sure that the compiler doesn't
then undo the "these two cases are disjoint" thing.

The qede_config_rx_mode() has a fairly big 'struct qed_filter_params'
structure on stack (it's mainly just a union of two other structures).
That one is a bit silly, because the very same function *alsu* does a
temporary allocation for the 'uc_macs[]' array, and I think it could
have literally made that allocation just do both the params and the
uc_macs[] array together.

But that "a bit silly" is actually *doubly* silly, because that big
structure allocated for the stack that is actually a union, uses the
QED_FILTER_TYPE_RX_MODE type of the union. Which in turn is literally
*one*single*enum*field*.

So the qede_config_rx_mode() case uses that chunk of kernel stack for
a big union for no good reason. It really only wants two words, but
the way the code is written, it uses a lot, because the union also has
a 'struct qed_filter_mcast_params' member that has an array of
[64][ETH_ALEN] bytes in it.

So that's about 400 bytes of stack space entirely wasted if I read the
code correctly.

The xhci_reserve_bandwidth() one is because it has an array of 31
'struct xhci_bw_info' on the stack. It's not a huge structure (six
32-bit words), but when you have 31 of those in an array, it's about
750 bytes right there. It should likely just be dynamically allocated
- it doesn't seem to be some super-important critical thing where an
allocation cannot be done.

The do_sys_poll() thing is a bit sad. The code has been tweaked to
basically use 1kB of stack space in one configuration. It overflows it
in a lot of other configs. Using stack space for those kinds of
top-level functions that are guaranteed to have an empty stack is
pretty much the best possible situation, but it's one where we don't
really have a good way to try to have some kind of dynamic feedback
from the compiler for how much other stack space it is using.

So that do_sys_poll() case is the only one I see where the stack usage
is actually fine and explicitly expected. We *aim* for 1kB of stack,
and then in some - probably quite a few - situations we go over.

There are many more of these cases. I've seen Hyper-V allocate 'struct
cpumask' on the stack, which is once again an absolute no-no that
people have apparently just ignored the warning for. When you have
NR_CPUS being the maximum of 8k, those bits add up, and a single
cpumask is 1kB in size. Which is why you should never do that on
stack, and instead use '

cpumask_var_t mask;
alloc_cpumask_var(&mask,..)

which will do a much more reasonable job. But the reason I call out
hyperv is that as far as I know, hyperv itself doesn't actually
support 8192 CPU's. So all that apic noise with 'struct cpumask' that
uses 1kB of data when NR_CPUS is set to 8192 is just wasted. Maybe I'm
wrong. Adding hyperv people to the cc too.

A lot of the stack frame size warnings are hidden by the fact that our
default value for warning about stack usage is 2kB for 64-bit builds.

Probably exactly because people did things like that cpumask thing,
and have these arrays of structures that are often even bigger in the
64-bit world.

Linus

2021-09-07 23:56:27

by Linus Torvalds

[permalink] [raw]
Subject: Re: ipv4/tcp.c:4234:1: error: the frame size of 1152 bytes is larger than 1024 bytes [-Werror=frame-larger-than=]

On Tue, Sep 7, 2021 at 4:35 PM Nathan Chancellor <[email protected]> wrote:
>
> Won't your example only fix the issue with CONFIG_CPUMASK_OFFSTACK=y

Yes, but..

> or am I misreading the gigantic comment in include/linux/cpumask.h?

you're not misreading the comment, but you are missing this important fact:

config NR_CPUS_RANGE_END
int
depends on X86_64
default 8192 if SMP && CPUMASK_OFFSTACK
default 512 if SMP && !CPUMASK_OFFSTACK
default 1 if !SMP

so basically you can't choose more than 512 CPU's unless
CPUMASK_OFFSTACK is set.

Of course, we may have some bug in the Kconfig elsewhere, and I didn't
check other architectures. So maybe there's some way to work around
it.

But basically the rule is that CPUMASK_OFFSTACK and NR_CPUS are linked.

That linkage is admittedly a bit hidden and much too subtle. I think
the only real reason why it's done that way is because people wanted
to do test builds with CPUMASK_OFFSTACK even without having to have
some ludicrous number of NR_CPUS.

You'll notice that the question "CPUMASK_OFFSTACK" is only enabled if
DEBUG_PER_CPU_MAPS is true.

That whole "for debugging" reason made more sense a decade ago when
this was all new and fancy.

It might make more sense to do that very explicitly, and make
CPUMASK_OFFSTACK be just something like

config NR_CPUS_RANGE_END
def_bool NR_CPUS <= 512

and get rid of the subtlety and choice in the matter.

Linus

2021-09-08 00:17:00

by Nathan Chancellor

[permalink] [raw]
Subject: Re: ipv4/tcp.c:4234:1: error: the frame size of 1152 bytes is larger than 1024 bytes [-Werror=frame-larger-than=]

On 9/7/2021 4:49 PM, Linus Torvalds wrote:
> On Tue, Sep 7, 2021 at 4:35 PM Nathan Chancellor <[email protected]> wrote:
>>
>> Won't your example only fix the issue with CONFIG_CPUMASK_OFFSTACK=y
>
> Yes, but..
>
>> or am I misreading the gigantic comment in include/linux/cpumask.h?
>
> you're not misreading the comment, but you are missing this important fact:
>
> config NR_CPUS_RANGE_END
> int
> depends on X86_64
> default 8192 if SMP && CPUMASK_OFFSTACK
> default 512 if SMP && !CPUMASK_OFFSTACK
> default 1 if !SMP
>
> so basically you can't choose more than 512 CPU's unless
> CPUMASK_OFFSTACK is set.
>
> Of course, we may have some bug in the Kconfig elsewhere, and I didn't
> check other architectures. So maybe there's some way to work around
> it.

Ah, okay, that is an x86-only limitation so I missed it. I do not think
there is any bug with that Kconfig logic but it is only used on x86.

> But basically the rule is that CPUMASK_OFFSTACK and NR_CPUS are linked.
>
> That linkage is admittedly a bit hidden and much too subtle. I think
> the only real reason why it's done that way is because people wanted
> to do test builds with CPUMASK_OFFSTACK even without having to have
> some ludicrous number of NR_CPUS.
>
> You'll notice that the question "CPUMASK_OFFSTACK" is only enabled if
> DEBUG_PER_CPU_MAPS is true.
>
> That whole "for debugging" reason made more sense a decade ago when
> this was all new and fancy.
>
> It might make more sense to do that very explicitly, and make
> CPUMASK_OFFSTACK be just something like
>
> config NR_CPUS_RANGE_END
> def_bool NR_CPUS <= 512
>
> and get rid of the subtlety and choice in the matter.

Indeed. Grepping around the tree, I see that arc, arm64, ia64, powerpc,
and sparc64 all support NR_CPUS up to 4096 (8192 for PPC) but none of
them select CPUMASK_OFFSTACK so it seems like they should test support
for CPUMASK_OFFSTACK and adopt similar logic to x86 to limit how much
stack space cpumask variables can use. Like you mentioned, it probably
has not come up before because most of those are 64-bit platforms that
have a higher default FRAME_WARN value (and the default NR_CPUS values
on all of them is small). I only noticed because Fedora sets NR_CPUS to
4096 for arm64 and has a FRAME_WARN value of 1024, meaning two cpumask
variables in the same frame puts that frame right at the 1024 limit.

Cheers,
Nathan

2021-09-08 01:45:57

by Linus Torvalds

[permalink] [raw]
Subject: Re: ipv4/tcp.c:4234:1: error: the frame size of 1152 bytes is larger than 1024 bytes [-Werror=frame-larger-than=]

On Tue, Sep 7, 2021 at 5:15 PM Nathan Chancellor <[email protected]> wrote:
>
> Ah, okay, that is an x86-only limitation so I missed it. I do not think
> there is any bug with that Kconfig logic but it is only used on x86.

Yeah. I think other architectures are basically just buggy, but nobody
has ever cared or noticed, because the hardware basically doesn't
exist.

People hopefully don't actually configure for the kernel theoretical
maximum outside of x86. Even there it's a bit ridiculous, but the
hardware with lots and lots of cores at least _has_ existed.

> Indeed. Grepping around the tree, I see that arc, arm64, ia64, powerpc,
> and sparc64 all support NR_CPUS up to 4096 (8192 for PPC) but none of
> them select CPUMASK_OFFSTACK

I think this one says it all:

arch/arc/Kconfig: range 2 4096

yeah. ARC allows you to configure 4k CPU's.

I think a lot of them have just copied the x86 code (it was 4k long
ago), without actually understanding all the details.

That is indeed a strong argument for getting rid of the current
much-too-subtle CPUMASK_OFFSTACK situation.

But as the hyperv code shows, even on x86 the "we never allocate a
full cpumask on the stack" has gotten lost a bit since the days that
Rusty and others actually implemented this all.

Linus

2021-09-08 01:47:06

by Linus Torvalds

[permalink] [raw]
Subject: Re: ipv4/tcp.c:4234:1: error: the frame size of 1152 bytes is larger than 1024 bytes [-Werror=frame-larger-than=]

On Tue, Sep 7, 2021 at 6:35 PM Linus Torvalds
<[email protected]> wrote:
>
> I think a lot of them have just copied the x86 code (it was 4k long
> ago), without actually understanding all the details.

Just to put the x86 number in perspective: it was raised to 8192 back
in 2013, with the comment

x86/cpu: Increase max CPU count to 8192

The MAXSMP option is intended to enable silly large numbers of
CPUs for testing purposes. The current value of 4096 isn't very
silly any longer as there are actual SGI machines that approach
6096 CPUs when taking HT into account.

Increase the value to a nice round 8192 to account for this and
allow for short term future increases.

so on the x86 side, people have actually done these things.

Other architectures? I think some IBM power9 machines can hit 192
cores (with SMT4 - so NR_CPUS of 768), but I don't think there's been
an equivalent of an SGI for anything but x86.

But admittedly I haven't checked or followed those things. I could
easily imagine some boutique super-beefy setup.

Linus

2021-09-08 07:11:38

by Johannes Berg

[permalink] [raw]
Subject: Re: ipv4/tcp.c:4234:1: error: the frame size of 1152 bytes is larger than 1024 bytes [-Werror=frame-larger-than=]

On Tue, 2021-09-07 at 16:14 -0700, Linus Torvalds wrote:
>
> The mac802.11 one seems to be due to 'struct ieee802_11_elems' being
> big, and allocated on the stack. I think it's probably made worse
> there with inlining, ie
>
>  - ieee80211_sta_rx_queued_mgmt() has one copy
>
>  - ieee80211_rx_mgmt_beacon() is possibly inlined, and has its own copy
>
> but even if it isn't due to that kind of duplication due to inlining,
> that code is dangerous. Exactly because it has two nested stack frames
> with that big structure, and they are active at the same time in the
> callchain whether inlined or not.
>
> And it's *pointlessly* dangerous, because the 'struct ieee802_11_elems
> elems' in ieee80211_sta_rx_queued_mgmt() is only used for the
> IEEE80211_STYPE_ACTION case, so it is entirely disjoint from the
> IEEE80211_STYPE_BEACON case, and those stack allocations simply should
> not nest like that in the first place.
>
> Making the IEEE80211_STYPE_ACTION case be its own function - like the
> other cases - and moving the struct there should fix it. Possibly a
> "noinline" or two necessary to make sure that the compiler doesn't
> then undo the "these two cases are disjoint" thing.

Yeah, I'm aware, and I agree. We've been looking at it every now and
then. This got made worse by us actually adding a fair amount of
pointers to the struct recently (in this merge window).

Ultimately, every new spec addition ends up needing to add something
there, so I think ultimately we'll probably want to either dynamically
allocate it somewhere (perhaps in a data structure used here already),
or possibly not have this at all and just find a way to return only the
bits that are interesting. Even parsing a ~1k frame (typical, max ~2k) a
handful of times is probably not even worse than having this large a
structure that gets filled data that's probably useless in many cases (I
think the different cases all just need a subset). But not sure, I'll
take a look.

johannes

2021-09-08 10:05:58

by Wei Liu

[permalink] [raw]
Subject: Re: ipv4/tcp.c:4234:1: error: the frame size of 1152 bytes is larger than 1024 bytes [-Werror=frame-larger-than=]

On Tue, Sep 07, 2021 at 04:14:24PM -0700, Linus Torvalds wrote:
> [ Added maintainers for various bits and pieces, since I spent the
> time trying to look at why those bits and pieces wasted stack-space
> and caused problems ]
>
> On Tue, Sep 7, 2021 at 3:16 PM Linus Torvalds
> <[email protected]> wrote:
[...]
>
> There are many more of these cases. I've seen Hyper-V allocate 'struct
> cpumask' on the stack, which is once again an absolute no-no that
> people have apparently just ignored the warning for. When you have
> NR_CPUS being the maximum of 8k, those bits add up, and a single
> cpumask is 1kB in size. Which is why you should never do that on
> stack, and instead use '
>
> cpumask_var_t mask;
> alloc_cpumask_var(&mask,..)
>
> which will do a much more reasonable job. But the reason I call out
> hyperv is that as far as I know, hyperv itself doesn't actually
> support 8192 CPU's. So all that apic noise with 'struct cpumask' that
> uses 1kB of data when NR_CPUS is set to 8192 is just wasted. Maybe I'm
> wrong. Adding hyperv people to the cc too.
>
> A lot of the stack frame size warnings are hidden by the fact that our
> default value for warning about stack usage is 2kB for 64-bit builds.
>
> Probably exactly because people did things like that cpumask thing,
> and have these arrays of structures that are often even bigger in the
> 64-bit world.
>

Thanks for the heads-up. I found one instance of this bad practice in
hv_apic.c. Presumably that's the one you were referring to.

However calling into the allocator from that IPI path seems very heavy
weight. I will discuss with fellow engineers on how to fix it properly.

Wei.

> Linus

2021-09-08 14:04:20

by Thorsten Glaser

[permalink] [raw]
Subject: Re: ipv4/tcp.c:4234:1: error: the frame size of 1152 bytes is larger than 1024 bytes [-Werror=frame-larger-than=]

On Tue, 7 Sep 2021, Linus Torvalds wrote:

> The do_tcp_getsockopt() one in tpc.c is a classic case of "lots of
> different case statements, many of them with their own struct
> allocations on stack, and all of them disjoint".

Any compiler developers here? AFAIK the compiler knows the lifetime
of function-local variables, so why not alias the actual memory
locations and ranges to minimise stack usage?

bye,
//mirabilos
--
Infrastrukturexperte • tarent solutions GmbH
Am Dickobskreuz 10, D-53121 Bonn • http://www.tarent.de/
Telephon +49 228 54881-393 • Fax: +49 228 54881-235
HRB AG Bonn 5168 • USt-ID (VAT): DE122264941
Geschäftsführer: Dr. Stefan Barth, Kai Ebenrett, Boris Esser, Alexander Steeg

****************************************************
/⁀\ The UTF-8 Ribbon
╲ ╱ Campaign against Mit dem tarent-Newsletter nichts mehr verpassen:
╳ HTML eMail! Also, https://www.tarent.de/newsletter
╱ ╲ header encryption!
****************************************************

2021-09-08 14:13:28

by Shuah Khan

[permalink] [raw]
Subject: Re: ipv4/tcp.c:4234:1: error: the frame size of 1152 bytes is larger than 1024 bytes [-Werror=frame-larger-than=]

On 9/7/21 5:14 PM, Linus Torvalds wrote:
> [ Added maintainers for various bits and pieces, since I spent the
> time trying to look at why those bits and pieces wasted stack-space
> and caused problems ]
>
> On Tue, Sep 7, 2021 at 3:16 PM Linus Torvalds
> <[email protected]> wrote:
>>
>> None of these seem to be new.
>
> That said, all but one of them seem to be (a) very much worth fixing
> and (b) easy to fix.
>
> The do_tcp_getsockopt() one in tpc.c is a classic case of "lots of
> different case statements, many of them with their own struct
> allocations on stack, and all of them disjoint".
>
> So the fix for that would be the traditional one of just making helper
> functions for the cases that aren't entirely trivial. We've done that
> before, and it not only fixes stack usage problems, it results in code
> that is easier to read, and generally actually performs better too
> (exactly because it avoids sparse stacks and extra D$ use)
>
> The pe_test_uints() one is the same horrendous nasty Kunit pattern
> that I fixed in commit 4b93c544e90e ("thunderbolt: test: split up test
> cases in tb_test_credit_alloc_all") that had an even worse case.
>
> The KUNIT macros create all these individually reasonably small
> initialized structures on stack, and when you have more than a small
> handful of them the KUNIT infrastructure just makes the stack space
> explode. Sometimes the compiler will be able to re-use the stack
> slots, but it seems to be an iffy proposition to depend on it - it
> seems to be a combination of luck and various config options.
>

I have been concerned about these macros creeping in for a while.
I will take a closer look and work with Brendan to come with a plan
to address it.

> I detest code that exists for debugging or for testing, and that
> violates fundamental rules and causes more problems in the process.
>

Having recently debugged a problem spinlock hold in an invalid context
related to debug code, I agree with you on this that test and debug code
shouldn't cause problems.

thanks,
-- Shuah

2021-09-08 15:01:02

by Eric Dumazet

[permalink] [raw]
Subject: Re: ipv4/tcp.c:4234:1: error: the frame size of 1152 bytes is larger than 1024 bytes [-Werror=frame-larger-than=]

On Wed, Sep 8, 2021 at 6:48 AM Thorsten Glaser <[email protected]> wrote:
>
> On Tue, 7 Sep 2021, Linus Torvalds wrote:
>
> > The do_tcp_getsockopt() one in tpc.c is a classic case of "lots of
> > different case statements, many of them with their own struct
> > allocations on stack, and all of them disjoint".
>
> Any compiler developers here? AFAIK the compiler knows the lifetime
> of function-local variables, so why not alias the actual memory
> locations and ranges to minimise stack usage?
>

At least on my builds, do_tcp_getsockopt() uses less than 512 bytes of stack.

Probably because tcp_zerocopy_receive() is _not_ inlined, by pure luck
I suppose.

Perhaps we should use noinline_for_stack here.

diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index e8b48df73c852a48e51754ea98b1e08bf024bb9e..437910c096b202420518c9e5e5cd26b2194d8aa2
100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -2054,9 +2054,10 @@ static void tcp_zc_finalize_rx_tstamp(struct sock *sk,
}

#define TCP_ZEROCOPY_PAGE_BATCH_SIZE 32
-static int tcp_zerocopy_receive(struct sock *sk,
- struct tcp_zerocopy_receive *zc,
- struct scm_timestamping_internal *tss)
+static noinline_for_stack int
+tcp_zerocopy_receive(struct sock *sk,
+ struct tcp_zerocopy_receive *zc,
+ struct scm_timestamping_internal *tss)
{
u32 length = 0, offset, vma_len, avail_len, copylen = 0;
unsigned long address = (unsigned long)zc->address;

2021-09-08 15:01:03

by David Laight

[permalink] [raw]
Subject: RE: ipv4/tcp.c:4234:1: error: the frame size of 1152 bytes is larger than 1024 bytes [-Werror=frame-larger-than=]

From: Wei Liu
> Sent: 08 September 2021 11:03
...
> However calling into the allocator from that IPI path seems very heavy
> weight. I will discuss with fellow engineers on how to fix it properly.

Isn't the IPI code something that is likely to get called
when a lot of stack has already been used?

So you really shouldn't be using much stack at all??

David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)

2021-09-08 15:50:56

by Linus Torvalds

[permalink] [raw]
Subject: Re: ipv4/tcp.c:4234:1: error: the frame size of 1152 bytes is larger than 1024 bytes [-Werror=frame-larger-than=]

On Wed, Sep 8, 2021 at 7:50 AM Eric Dumazet <[email protected]> wrote:
>
> At least on my builds, do_tcp_getsockopt() uses less than 512 bytes of stack.
>
> Probably because tcp_zerocopy_receive() is _not_ inlined, by pure luck
> I suppose.
>
> Perhaps we should use noinline_for_stack here.

I agree that that is likely a good idea, but I also suspect that the
stack growth may be related to other issues. So it being less than 512
bytes for you may be related to other random noise than inlining.

In the past I've seen at least two patterns

(a) not merging stack slots at all

(b) some odd "pattern allocator" problems, where I think gcc ended up
re-using previous stack slots if they were the right size, but failing
when previous allocations were fragmented

that (a) thing is what -fconserve-stack is all about, and we also used
to have (iirc) -fno-defer-pop to avoid having function call argument
stacks stick around.

And (b) is one of those "random allocation pattern" things, which
depends on the phase of the moon, where gcc ends up treating the stack
frame as a series of fixed-size allocations, but isn't very smart
about it. Even if some allocations got free'd, they might be
surrounded by oithers that didn't, and then gcc wouldn't re-use them
if there's a bigger allocation afterwards. And similarly, I don't
think gcc ever even joins together two free'd stack frame allocations.

I also wouldn't be surprised at all if some of our hardening flags
ended up causing the stack frame reuse to entirely fail. IOW, I could
easily see things like INIT_STACK_ALL_ZERO might cause the compiler to
initialize all the stack frame allocations "early", so that their
lifetimes all overlap.

So it could easily be about very subtle and random code generation
choices that just change the order of allocation. A spill in the wrong
place, things like that.

Or it could be about not-so-subtle big config option things.

Linus

2021-09-08 16:08:18

by Linus Torvalds

[permalink] [raw]
Subject: Re: ipv4/tcp.c:4234:1: error: the frame size of 1152 bytes is larger than 1024 bytes [-Werror=frame-larger-than=]

On Wed, Sep 8, 2021 at 3:03 AM Wei Liu <[email protected]> wrote:
>
> Thanks for the heads-up. I found one instance of this bad practice in
> hv_apic.c. Presumably that's the one you were referring to.

Yeah, that must have been the one I saw.

> However calling into the allocator from that IPI path seems very heavy
> weight. I will discuss with fellow engineers on how to fix it properly.

In other places, the options have been fairly straightforward:

(a) avoid the allocation entirely.

I think the main reason hyperv does it is because it wants to clear
the "current cpu" from the cpumask for the ALL_BUT_SELF case, and if
you can just instead keep track of that "all but self" bit separately
and pass it down the call chain, you may not need that allocation at
all.

(b) use a static percpu allocation

The IPI code generally wants interrupts disabled anyway, or it
certainly can just do the required preemption disable to make sure
that it has exclusive access to a percpu allocation.

(c) take advantage of any hypervisor limitations

If hyperv is limited to some smaller number of CPU's - google seems to
imply 240 - maybe you can keep such a smaller allocation on the stack,
but just verify that the incoming cpumask is less than whatever the
hyperv limit is.

That (c) is obviously the worst choice in some sense, but in some
cases limiting yourself to simplify things isn't wrong.

I suspect the percpu allocation model is the easiest one. It's what
other IPI code does. See for example

arch/x86/kernel/apic/x2apic_cluster.c:
__x2apic_send_IPI_mask()

and that percpu 'ipi_mask' thing.

That said, if you are already just iterating over the mask, doing (a)
may be trivial. No allocation at all is even better than a percpu one.

I'm sure there are other options. The above is just the obvious ones
that come to mind.

Linus

2021-09-08 16:58:40

by Arnd Bergmann

[permalink] [raw]
Subject: Re: ipv4/tcp.c:4234:1: error: the frame size of 1152 bytes is larger than 1024 bytes [-Werror=frame-larger-than=]

On Wed, Sep 8, 2021 at 5:49 PM Linus Torvalds
<[email protected]> wrote:
> On Wed, Sep 8, 2021 at 7:50 AM Eric Dumazet <[email protected]> wrote:
> In the past I've seen at least two patterns
>
> (a) not merging stack slots at all
>
> (b) some odd "pattern allocator" problems, where I think gcc ended up
> re-using previous stack slots if they were the right size, but failing
> when previous allocations were fragmented
>
> that (a) thing is what -fconserve-stack is all about, and we also used
> to have (iirc) -fno-defer-pop to avoid having function call argument
> stacks stick around.

CONFIG_KASAN_STACK leads to (a), and this has been the source of
long discussions about whether to turn it off altogether, the current state
being that it's disabled for CONFIG_COMPILE_TEST on clang because
this can explode the stack usage even further when it starts spilling
registers.

gcc also runs into this problem, but at least newer versions at not nearly
as bad as they used to be in the past.

Arnd

2021-09-08 17:05:10

by Arnd Bergmann

[permalink] [raw]
Subject: Re: ipv4/tcp.c:4234:1: error: the frame size of 1152 bytes is larger than 1024 bytes [-Werror=frame-larger-than=]

On Wed, Sep 8, 2021 at 3:43 AM Linus Torvalds
<[email protected]> wrote:
>
> On Tue, Sep 7, 2021 at 6:35 PM Linus Torvalds
> <[email protected]> wrote:
> >
> > I think a lot of them have just copied the x86 code (it was 4k long
> > ago), without actually understanding all the details.
>
> Just to put the x86 number in perspective: it was raised to 8192 back
> in 2013, with the comment
>
> x86/cpu: Increase max CPU count to 8192
>
> The MAXSMP option is intended to enable silly large numbers of
> CPUs for testing purposes. The current value of 4096 isn't very
> silly any longer as there are actual SGI machines that approach
> 6096 CPUs when taking HT into account.
>
> Increase the value to a nice round 8192 to account for this and
> allow for short term future increases.
>
> so on the x86 side, people have actually done these things.
>
> Other architectures? I think some IBM power9 machines can hit 192
> cores (with SMT4 - so NR_CPUS of 768), but I don't think there's been
> an equivalent of an SGI for anything but x86.
>
> But admittedly I haven't checked or followed those things. I could
> easily imagine some boutique super-beefy setup.

POWER10 was just announced with threads 1920 using SMT8,
I think the latest s390 and sparc64 (from 2017) are in the same
ballpark when using SMT. The largest arm64 I know of was ThunderX3
with 768 threads on dual-socket machines. This got cancelled before
it was shipped to customers, but it's likely that others will exceed that
in the future.

Arnd

2021-09-08 17:46:40

by Wei Liu

[permalink] [raw]
Subject: Re: ipv4/tcp.c:4234:1: error: the frame size of 1152 bytes is larger than 1024 bytes [-Werror=frame-larger-than=]

On Wed, Sep 08, 2021 at 02:51:21PM +0000, David Laight wrote:
> From: Wei Liu
> > Sent: 08 September 2021 11:03
> ...
> > However calling into the allocator from that IPI path seems very heavy
> > weight. I will discuss with fellow engineers on how to fix it properly.
>
> Isn't the IPI code something that is likely to get called
> when a lot of stack has already been used?
>
> So you really shouldn't be using much stack at all??

I don't follow your questions. I don't dispute there is a problem. I
just think calling into the allocator is not a good idea in that
particular piece of code we need to fix.

Hopefully we can come up with a solution to remove need for a cpumask in
that code -- discussion is on-going.

Wei.

>
> David
>
> -
> Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
> Registration No: 1397386 (Wales)
>

2021-09-08 17:48:09

by David Laight

[permalink] [raw]
Subject: RE: ipv4/tcp.c:4234:1: error: the frame size of 1152 bytes is larger than 1024 bytes [-Werror=frame-larger-than=]

From: Wei Liu
> Sent: 08 September 2021 16:24
>
> On Wed, Sep 08, 2021 at 02:51:21PM +0000, David Laight wrote:
> > From: Wei Liu
> > > Sent: 08 September 2021 11:03
> > ...
> > > However calling into the allocator from that IPI path seems very heavy
> > > weight. I will discuss with fellow engineers on how to fix it properly.
> >
> > Isn't the IPI code something that is likely to get called
> > when a lot of stack has already been used?
> >
> > So you really shouldn't be using much stack at all??
>
> I don't follow your questions. I don't dispute there is a problem. I
> just think calling into the allocator is not a good idea in that
> particular piece of code we need to fix.
>
> Hopefully we can come up with a solution to remove need for a cpumask in
> that code -- discussion is on-going.

I'm pretty sure the IPI interrupt is high priority so can
nest with another interrupt.
(You certainly want it to be that way.)

So the kernel may already be running on the interrupt stack.
If the interrupted ISR code has used a lot of stack then
there may not be as much left as you might expect.

Many years ago (nearly 40!) I wrote something that did static
stack depth analysis for an embedded system.
Since there were no (interesting) indirect calls an no recursion
it wasn't completely impossible.
What it showed was that the deepest stack use was in error
trace paths that probably never happened.
I suspect the same is true for Linux - the deepest points
will be inside printk() in obscure error paths.
Get an IPI while in a printk() from deep inside an ISR
and you may not have the amount of stack you expect.

It might be possible to use the clang 'control flow integrity'
information to determine the actual maximum stack use even
for indirectly called functions.
I suspect that would be an eye-opener....

David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)

2021-09-08 17:49:13

by Shuah Khan

[permalink] [raw]
Subject: Re: ipv4/tcp.c:4234:1: error: the frame size of 1152 bytes is larger than 1024 bytes [-Werror=frame-larger-than=]

On 9/8/21 11:05 AM, Arnd Bergmann wrote:
> On Wed, Sep 8, 2021 at 4:12 PM Shuah Khan <[email protected]> wrote:
>> On 9/7/21 5:14 PM, Linus Torvalds wrote:
>>> The KUNIT macros create all these individually reasonably small
>>> initialized structures on stack, and when you have more than a small
>>> handful of them the KUNIT infrastructure just makes the stack space
>>> explode. Sometimes the compiler will be able to re-use the stack
>>> slots, but it seems to be an iffy proposition to depend on it - it
>>> seems to be a combination of luck and various config options.
>>>
>>
>> I have been concerned about these macros creeping in for a while.
>> I will take a closer look and work with Brendan to come with a plan
>> to address it.
>
> I've previously sent patches to turn off the structleak plugin for
> any kunit test file to work around this, but only a few of those patches
> got merged and new files have been added since. It would
> definitely help to come up with a proper fix, but my structleak-disable
> hack should be sufficient as a quick fix.
>

Looks like these are RFC patches and the discussion went cold. Let's pick
this back up and we can make progress.

https://lore.kernel.org/lkml/CAFd5g45+JqKDqewqz2oZtnphA-_0w62FdSTkRs43K_NJUgnLBg@mail.gmail.com/

thanks,
-- Shuah

2021-09-08 18:15:36

by Arnd Bergmann

[permalink] [raw]
Subject: Re: ipv4/tcp.c:4234:1: error: the frame size of 1152 bytes is larger than 1024 bytes [-Werror=frame-larger-than=]

On Wed, Sep 8, 2021 at 4:12 PM Shuah Khan <[email protected]> wrote:
> On 9/7/21 5:14 PM, Linus Torvalds wrote:
> > The KUNIT macros create all these individually reasonably small
> > initialized structures on stack, and when you have more than a small
> > handful of them the KUNIT infrastructure just makes the stack space
> > explode. Sometimes the compiler will be able to re-use the stack
> > slots, but it seems to be an iffy proposition to depend on it - it
> > seems to be a combination of luck and various config options.
> >
>
> I have been concerned about these macros creeping in for a while.
> I will take a closer look and work with Brendan to come with a plan
> to address it.

I've previously sent patches to turn off the structleak plugin for
any kunit test file to work around this, but only a few of those patches
got merged and new files have been added since. It would
definitely help to come up with a proper fix, but my structleak-disable
hack should be sufficient as a quick fix.

Arnd

2021-09-08 19:02:14

by Wei Liu

[permalink] [raw]
Subject: Re: ipv4/tcp.c:4234:1: error: the frame size of 1152 bytes is larger than 1024 bytes [-Werror=frame-larger-than=]

On Wed, Sep 08, 2021 at 08:59:36AM -0700, Linus Torvalds wrote:
> On Wed, Sep 8, 2021 at 3:03 AM Wei Liu <[email protected]> wrote:
> >
> > Thanks for the heads-up. I found one instance of this bad practice in
> > hv_apic.c. Presumably that's the one you were referring to.
>
> Yeah, that must have been the one I saw.
>
> > However calling into the allocator from that IPI path seems very heavy
> > weight. I will discuss with fellow engineers on how to fix it properly.
>
> In other places, the options have been fairly straightforward:
>
> (a) avoid the allocation entirely.
>
> I think the main reason hyperv does it is because it wants to clear
> the "current cpu" from the cpumask for the ALL_BUT_SELF case, and if
> you can just instead keep track of that "all but self" bit separately
> and pass it down the call chain, you may not need that allocation at
> all.
[..]
>
> That said, if you are already just iterating over the mask, doing (a)
> may be trivial. No allocation at all is even better than a percpu one.
>

Yep. I just wrote two patches for this approach.

Wei.

2021-09-08 21:27:59

by Brendan Higgins

[permalink] [raw]
Subject: Re: ipv4/tcp.c:4234:1: error: the frame size of 1152 bytes is larger than 1024 bytes [-Werror=frame-larger-than=]

On Wed, Sep 8, 2021 at 10:16 AM Shuah Khan <[email protected]> wrote:
>
> On 9/8/21 11:05 AM, Arnd Bergmann wrote:
> > On Wed, Sep 8, 2021 at 4:12 PM Shuah Khan <[email protected]> wrote:
> >> On 9/7/21 5:14 PM, Linus Torvalds wrote:
> >>> The KUNIT macros create all these individually reasonably small
> >>> initialized structures on stack, and when you have more than a small
> >>> handful of them the KUNIT infrastructure just makes the stack space
> >>> explode. Sometimes the compiler will be able to re-use the stack
> >>> slots, but it seems to be an iffy proposition to depend on it - it
> >>> seems to be a combination of luck and various config options.
> >>>
> >>
> >> I have been concerned about these macros creeping in for a while.
> >> I will take a closer look and work with Brendan to come with a plan
> >> to address it.
> >
> > I've previously sent patches to turn off the structleak plugin for
> > any kunit test file to work around this, but only a few of those patches
> > got merged and new files have been added since. It would
> > definitely help to come up with a proper fix, but my structleak-disable
> > hack should be sufficient as a quick fix.
> >
>
> Looks like these are RFC patches and the discussion went cold. Let's pick
> this back up and we can make progress.
>
> https://lore.kernel.org/lkml/CAFd5g45+JqKDqewqz2oZtnphA-_0w62FdSTkRs43K_NJUgnLBg@mail.gmail.com/

I can try to get the patch reapplying and send it out (I just figured
that Arnd or Kees would want to send it out :-) since it was your
idea).

I definitely agree that in the cases where KUnit is not actually
contributing to blowing the stack - struct leak just thinks it is,
this is fine; however, it sounds like Linus' concerns with KUnit's
macros go deeper than this. Arnd, I think you sketched out a way to
make the KUNIT_* macros take up less space, but after some
investigation we found that it was pretty inflexible.

Ideally test cases should never get big enough for KUNIT_* macros to
be a problem (when they do it is usually an indication that your test
case is trying to do too many things); nevertheless, we are still in
this situation.

I think I will need to dust off some cobwebs out of my brain to
remember why I didn't like the idea of making the KUNIT_* macros take
up less stack space.

2021-09-08 22:37:48

by Linus Torvalds

[permalink] [raw]
Subject: Re: ipv4/tcp.c:4234:1: error: the frame size of 1152 bytes is larger than 1024 bytes [-Werror=frame-larger-than=]

On Wed, Sep 8, 2021 at 2:25 PM Brendan Higgins
<[email protected]> wrote:
>
> I definitely agree that in the cases where KUnit is not actually
> contributing to blowing the stack - struct leak just thinks it is,
> this is fine; however, it sounds like Linus' concerns with KUnit's
> macros go deeper than this.

I don't mind Kunit tests when they don't cause problems, but one very
natural way to use the Kunit test infrastructure does seem to be to
just put a lot of them into one function.

And then the individually fairly small structures just add up.
Probably mainly in some special configurations (ie together with
CONFIG_KASAN_STACK as pointed out by Arnd, but there might be other
cases that cause that issue too) where the compiler then doesn't merge
stack slots.

I wonder if those 'kunit_assert' structures could be split into two:
one part that could be 'static const', and at least shrink the dynamic
stack use that way. Because at a minimun, things like
type/file/line/format-msg seem to be things that really are just
static and const.

Hmm?

Linus

2021-09-14 01:10:52

by Shuah Khan

[permalink] [raw]
Subject: Re: ipv4/tcp.c:4234:1: error: the frame size of 1152 bytes is larger than 1024 bytes [-Werror=frame-larger-than=]

On 9/8/21 3:24 PM, Brendan Higgins wrote:
> On Wed, Sep 8, 2021 at 10:16 AM Shuah Khan <[email protected]> wrote:
>>
>> On 9/8/21 11:05 AM, Arnd Bergmann wrote:
>>> On Wed, Sep 8, 2021 at 4:12 PM Shuah Khan <[email protected]> wrote:
>>>> On 9/7/21 5:14 PM, Linus Torvalds wrote:
>>>>> The KUNIT macros create all these individually reasonably small
>>>>> initialized structures on stack, and when you have more than a small
>>>>> handful of them the KUNIT infrastructure just makes the stack space
>>>>> explode. Sometimes the compiler will be able to re-use the stack
>>>>> slots, but it seems to be an iffy proposition to depend on it - it
>>>>> seems to be a combination of luck and various config options.
>>>>>
>>>>
>>>> I have been concerned about these macros creeping in for a while.
>>>> I will take a closer look and work with Brendan to come with a plan
>>>> to address it.
>>>
>>> I've previously sent patches to turn off the structleak plugin for
>>> any kunit test file to work around this, but only a few of those patches
>>> got merged and new files have been added since. It would
>>> definitely help to come up with a proper fix, but my structleak-disable
>>> hack should be sufficient as a quick fix.
>>>
>>
>> Looks like these are RFC patches and the discussion went cold. Let's pick
>> this back up and we can make progress.
>>
>> https://lore.kernel.org/lkml/CAFd5g45+JqKDqewqz2oZtnphA-_0w62FdSTkRs43K_NJUgnLBg@mail.gmail.com/
>
> I can try to get the patch reapplying and send it out (I just figured
> that Arnd or Kees would want to send it out :-) since it was your
> idea).
>

Brendan,

Would you like to send me the fix with Suggested-by for Arnd or Kees?

thanks,
-- Shuah

2021-09-14 20:50:29

by Brendan Higgins

[permalink] [raw]
Subject: Re: ipv4/tcp.c:4234:1: error: the frame size of 1152 bytes is larger than 1024 bytes [-Werror=frame-larger-than=]

On Mon, Sep 13, 2021 at 1:55 PM Shuah Khan <[email protected]> wrote:
>
> On 9/8/21 3:24 PM, Brendan Higgins wrote:
> > On Wed, Sep 8, 2021 at 10:16 AM Shuah Khan <[email protected]> wrote:
> >>
> >> On 9/8/21 11:05 AM, Arnd Bergmann wrote:
> >>> On Wed, Sep 8, 2021 at 4:12 PM Shuah Khan <[email protected]> wrote:
> >>>> On 9/7/21 5:14 PM, Linus Torvalds wrote:
> >>>>> The KUNIT macros create all these individually reasonably small
> >>>>> initialized structures on stack, and when you have more than a small
> >>>>> handful of them the KUNIT infrastructure just makes the stack space
> >>>>> explode. Sometimes the compiler will be able to re-use the stack
> >>>>> slots, but it seems to be an iffy proposition to depend on it - it
> >>>>> seems to be a combination of luck and various config options.
> >>>>>
> >>>>
> >>>> I have been concerned about these macros creeping in for a while.
> >>>> I will take a closer look and work with Brendan to come with a plan
> >>>> to address it.
> >>>
> >>> I've previously sent patches to turn off the structleak plugin for
> >>> any kunit test file to work around this, but only a few of those patches
> >>> got merged and new files have been added since. It would
> >>> definitely help to come up with a proper fix, but my structleak-disable
> >>> hack should be sufficient as a quick fix.
> >>>
> >>
> >> Looks like these are RFC patches and the discussion went cold. Let's pick
> >> this back up and we can make progress.
> >>
> >> https://lore.kernel.org/lkml/CAFd5g45+JqKDqewqz2oZtnphA-_0w62FdSTkRs43K_NJUgnLBg@mail.gmail.com/
> >
> > I can try to get the patch reapplying and send it out (I just figured
> > that Arnd or Kees would want to send it out :-) since it was your
> > idea).
> >
>
> Brendan,
>
> Would you like to send me the fix with Suggested-by for Arnd or Kees?

So it looks like Arnd's fix was accepted (whether by him or someone
else) for property-entry-test and Linus already fixed thunderbolt, so
the only remaining of Arnd's patches is for the bitfield test, so I'll
resend that one in a bit.

Also, I haven't actually tried Linus' suggestion yet, but the logic is
sound and the change *should* be fairly unintrusive - I am going to
give that a try and report back (but I will get the bitfield
structleak disable patch out first since I already got that applying).

2021-09-14 22:05:02

by Arnd Bergmann

[permalink] [raw]
Subject: Re: ipv4/tcp.c:4234:1: error: the frame size of 1152 bytes is larger than 1024 bytes [-Werror=frame-larger-than=]

On Tue, Sep 14, 2021 at 10:48 PM Brendan Higgins
<[email protected]> wrote:
>
> On Mon, Sep 13, 2021 at 1:55 PM Shuah Khan <[email protected]> wrote:
> >
> > On 9/8/21 3:24 PM, Brendan Higgins wrote:
> > Brendan,
> >
> > Would you like to send me the fix with Suggested-by for Arnd or Kees?
>
> So it looks like Arnd's fix was accepted (whether by him or someone
> else) for property-entry-test and Linus already fixed thunderbolt, so
> the only remaining of Arnd's patches is for the bitfield test, so I'll
> resend that one in a bit.
>
> Also, I haven't actually tried Linus' suggestion yet, but the logic is
> sound and the change *should* be fairly unintrusive - I am going to
> give that a try and report back (but I will get the bitfield
> structleak disable patch out first since I already got that applying).

Looking at my randconfig tree, I find these six instances:

$ git grep -w DISABLE_STRUCTLEAK_PLUGIN
drivers/base/test/Makefile:CFLAGS_property-entry-test.o +=
$(DISABLE_STRUCTLEAK_PLUGIN)
drivers/iio/test/Makefile:CFLAGS_iio-test-format.o +=
$(DISABLE_STRUCTLEAK_PLUGIN)
drivers/mmc/host/Makefile:CFLAGS_sdhci-of-aspeed.o +=
$(DISABLE_STRUCTLEAK_PLUGIN)
drivers/thunderbolt/Makefile:CFLAGS_test.o += $(DISABLE_STRUCTLEAK_PLUGIN)
lib/Makefile:CFLAGS_test_scanf.o += $(DISABLE_STRUCTLEAK_PLUGIN)
lib/Makefile:CFLAGS_bitfield_kunit.o += $(DISABLE_STRUCTLEAK_PLUGIN)
scripts/Makefile.gcc-plugins: DISABLE_STRUCTLEAK_PLUGIN +=
-fplugin-arg-structleak_plugin-disable
scripts/Makefile.gcc-plugins:export DISABLE_STRUCTLEAK_PLUGIN

Sorry for failing to submit these as a proper patch. If you send a new version,
I think you need to make sure you cover all of the above, using whichever
change you like best.

Arnd

2021-09-17 12:27:55

by Brendan Higgins

[permalink] [raw]
Subject: Re: ipv4/tcp.c:4234:1: error: the frame size of 1152 bytes is larger than 1024 bytes [-Werror=frame-larger-than=]

On Tue, Sep 14, 2021 at 3:04 PM Arnd Bergmann <[email protected]> wrote:
>
> On Tue, Sep 14, 2021 at 10:48 PM Brendan Higgins
> <[email protected]> wrote:
> >
> > On Mon, Sep 13, 2021 at 1:55 PM Shuah Khan <[email protected]> wrote:
> > >
> > > On 9/8/21 3:24 PM, Brendan Higgins wrote:
> > > Brendan,
> > >
> > > Would you like to send me the fix with Suggested-by for Arnd or Kees?
> >
> > So it looks like Arnd's fix was accepted (whether by him or someone
> > else) for property-entry-test and Linus already fixed thunderbolt, so
> > the only remaining of Arnd's patches is for the bitfield test, so I'll
> > resend that one in a bit.
> >
> > Also, I haven't actually tried Linus' suggestion yet, but the logic is
> > sound and the change *should* be fairly unintrusive - I am going to
> > give that a try and report back (but I will get the bitfield
> > structleak disable patch out first since I already got that applying).
>
> Looking at my randconfig tree, I find these six instances:
>
> $ git grep -w DISABLE_STRUCTLEAK_PLUGIN
> drivers/base/test/Makefile:CFLAGS_property-entry-test.o +=
> $(DISABLE_STRUCTLEAK_PLUGIN)
> drivers/iio/test/Makefile:CFLAGS_iio-test-format.o +=
> $(DISABLE_STRUCTLEAK_PLUGIN)
> drivers/mmc/host/Makefile:CFLAGS_sdhci-of-aspeed.o +=
> $(DISABLE_STRUCTLEAK_PLUGIN)
> drivers/thunderbolt/Makefile:CFLAGS_test.o += $(DISABLE_STRUCTLEAK_PLUGIN)
> lib/Makefile:CFLAGS_test_scanf.o += $(DISABLE_STRUCTLEAK_PLUGIN)
> lib/Makefile:CFLAGS_bitfield_kunit.o += $(DISABLE_STRUCTLEAK_PLUGIN)
> scripts/Makefile.gcc-plugins: DISABLE_STRUCTLEAK_PLUGIN +=
> -fplugin-arg-structleak_plugin-disable
> scripts/Makefile.gcc-plugins:export DISABLE_STRUCTLEAK_PLUGIN

Alright, I incorporated all the above into a patchset that I think is
ready to send out, but I had a couple of issues with the above
suggestions:

- I could not find a config which causes a stacksize warning for
sdhci-of-aspeed.
- test_scanf is not a KUnit test.
- Linus already fixed the thunderbolt test by breaking up the test cases.

I am going to send out patches for the thunderbolt test and for the
sdhci-of-aspeed test for the sake of completeness, but I am not sure
if we should merge those two. I'll let y'all decide on the patch
review.

I only based the thunderbolt and bitfield test fixes on actual patches
from Arnd, but I think Arnd pretty much did all the work here so I am
crediting him with a Co-developed-by on all the other patches, so
Arnd: please follow up on the other patches with a signed-off-by,
unless you would rather me credit you in some other way.

> Sorry for failing to submit these as a proper patch. If you send a new version,
> I think you need to make sure you cover all of the above, using whichever
> change you like best.

I am still going to try to get Linus' suggestion working since it
actually solves the problem, but I would rather get the above
suggested fix out there since it is quick and I know it works.

Cheers

2021-09-17 17:14:13

by Brendan Higgins

[permalink] [raw]
Subject: Re: ipv4/tcp.c:4234:1: error: the frame size of 1152 bytes is larger than 1024 bytes [-Werror=frame-larger-than=]

On Thu, Sep 16, 2021 at 10:39 PM Brendan Higgins
<[email protected]> wrote:
>
> On Tue, Sep 14, 2021 at 3:04 PM Arnd Bergmann <[email protected]> wrote:
> >
> > On Tue, Sep 14, 2021 at 10:48 PM Brendan Higgins
> > <[email protected]> wrote:
> > >
> > > On Mon, Sep 13, 2021 at 1:55 PM Shuah Khan <[email protected]> wrote:
> > > >
> > > > On 9/8/21 3:24 PM, Brendan Higgins wrote:
[...]
> Alright, I incorporated all the above into a patchset that I think is
> ready to send out, but I had a couple of issues with the above
> suggestions:
>
> - I could not find a config which causes a stacksize warning for
> sdhci-of-aspeed.
> - test_scanf is not a KUnit test.
> - Linus already fixed the thunderbolt test by breaking up the test cases.
>
> I am going to send out patches for the thunderbolt test and for the
> sdhci-of-aspeed test for the sake of completeness, but I am not sure
> if we should merge those two. I'll let y'all decide on the patch
> review.

Just in case I missed any interested parties on this thread, I posted
my patches here:

https://lore.kernel.org/linux-kselftest/[email protected]/T/#t

> I only based the thunderbolt and bitfield test fixes on actual patches
> from Arnd, but I think Arnd pretty much did all the work here so I am
> crediting him with a Co-developed-by on all the other patches, so
> Arnd: please follow up on the other patches with a signed-off-by,
> unless you would rather me credit you in some other way.
>
> > Sorry for failing to submit these as a proper patch. If you send a new version,
> > I think you need to make sure you cover all of the above, using whichever
> > change you like best.
>
> I am still going to try to get Linus' suggestion working since it
> actually solves the problem, but I would rather get the above
> suggested fix out there since it is quick and I know it works.

2021-09-17 17:30:48

by Arnd Bergmann

[permalink] [raw]
Subject: Re: ipv4/tcp.c:4234:1: error: the frame size of 1152 bytes is larger than 1024 bytes [-Werror=frame-larger-than=]

On Fri, Sep 17, 2021 at 8:16 AM Brendan Higgins
<[email protected]> wrote:
>
> On Thu, Sep 16, 2021 at 10:39 PM Brendan Higgins
> <[email protected]> wrote:
> >
> > On Tue, Sep 14, 2021 at 3:04 PM Arnd Bergmann <[email protected]> wrote:
> > >
> > > On Tue, Sep 14, 2021 at 10:48 PM Brendan Higgins
> > > <[email protected]> wrote:
> > > >
> > > > On Mon, Sep 13, 2021 at 1:55 PM Shuah Khan <[email protected]> wrote:
> > > > >
> > > > > On 9/8/21 3:24 PM, Brendan Higgins wrote:
> [...]
> > Alright, I incorporated all the above into a patchset that I think is
> > ready to send out, but I had a couple of issues with the above
> > suggestions:

Thanks a lot for those suggestions.

> > - I could not find a config which causes a stacksize warning for
> > sdhci-of-aspeed.

I keep a history of my randconfig builds. This one only happened
once before I fixed it, it may depend on some other combination of
options. See my original defconfig file at
https://pastebin.com/raw/XJxjVGYa
rand/0xAB2DD5A0-failure:/git/arm-soc/drivers/mmc/host/sdhci-of-aspeed-test.c:47:1:
error: the frame size of 1152 bytes is larger than 1024 bytes
[-Werror=frame-larger-than=]

> > - test_scanf is not a KUnit test.

I have three defconfigs for this one, all on x86-64:

rand86/0x30AD57FB-failure:/git/arm-soc/lib/test_scanf.c: In function
'numbers_list_field_width_val_width':
rand86/0x30AD57FB-failure:/git/arm-soc/lib/test_scanf.c:530:1: error:
the frame size of 2488 bytes is larger than 2048 bytes
[-Werror=frame-larger-than=]
rand86/0x30AD57FB-failure:/git/arm-soc/lib/test_scanf.c: In function
'numbers_list_field_width_typemax':
rand86/0x30AD57FB-failure:/git/arm-soc/lib/test_scanf.c:488:1: error:
the frame size of 2968 bytes is larger than 2048 bytes
[-Werror=frame-larger-than=]
rand86/0x30AD57FB-failure:/git/arm-soc/lib/test_scanf.c: In function
'numbers_list':
rand86/0x30AD57FB-failure:/git/arm-soc/lib/test_scanf.c:437:1: error:
the frame size of 2488 bytes is larger than 2048 bytes
[-Werror=frame-larger-than=]

https://pastebin.com/raw/jUdY6d3G is the worst one of those

> > - Linus already fixed the thunderbolt test by breaking up the test cases.

Ok.

> > I am going to send out patches for the thunderbolt test and for the
> > sdhci-of-aspeed test for the sake of completeness, but I am not sure
> > if we should merge those two. I'll let y'all decide on the patch
> > review.
>
> Just in case I missed any interested parties on this thread, I posted
> my patches here:
>
> https://lore.kernel.org/linux-kselftest/[email protected]/T/#t

Thanks! I'll reply to the particular patch as well, but I don't think
that this is
sufficient here:

+CFLAGS_bitfield_kunit.o := $(call
cc-option,-Wframe-larger-than=10240) $(DISABLE_STRUCTLEAK_PLUGIN)

If 10KB is actually needed, this definitely overflows the 8KB stack on
32-bit architectures.

Arnd