2021-05-19 20:08:47

by Michal Suchánek

[permalink] [raw]
Subject: BPF: failed module verification on linux-next

Hello,

linux-next fails to boot for me:

[ 0.000000] Linux version 5.13.0-rc2-next-20210519-1.g3455ff8-vanilla (geeko@buildhost) (gcc (SUSE Linux) 10.3.0, GNU ld (GNU Binutils;
openSUSE Tumbleweed) 2.36.1.20210326-3) #1 SMP Wed May 19 10:05:10 UTC 2021 (3455ff8)
[ 0.000000] Command line: BOOT_IMAGE=/boot/vmlinuz-5.13.0-rc2-next-20210519-1.g3455ff8-vanilla root=UUID=ec42c33e-a2c2-4c61-afcc-93e9527
8f687 plymouth.enable=0 resume=/dev/disk/by-uuid/f1fe4560-a801-4faf-a638-834c407027c7 mitigations=auto earlyprintk initcall_debug nomodeset
earlycon ignore_loglevel console=ttyS0,115200
...
[ 26.093364] calling tracing_set_default_clock+0x0/0x62 @ 1
[ 26.098937] initcall tracing_set_default_clock+0x0/0x62 returned 0 after 0 usecs
[ 26.106330] calling acpi_gpio_handle_deferred_request_irqs+0x0/0x7c @ 1
[ 26.113033] initcall acpi_gpio_handle_deferred_request_irqs+0x0/0x7c returned 0 after 3 usecs
[ 26.121559] calling clk_disable_unused+0x0/0x102 @ 1
[ 26.126620] initcall clk_disable_unused+0x0/0x102 returned 0 after 0 usecs
[ 26.133491] calling regulator_init_complete+0x0/0x25 @ 1
[ 26.138890] initcall regulator_init_complete+0x0/0x25 returned 0 after 0 usecs
[ 26.147816] Freeing unused decrypted memory: 2036K
[ 26.153682] Freeing unused kernel image (initmem) memory: 2308K
[ 26.165776] Write protecting the kernel read-only data: 26624k
[ 26.173067] Freeing unused kernel image (text/rodata gap) memory: 2036K
[ 26.180416] Freeing unused kernel image (rodata/data gap) memory: 1184K
[ 26.187031] Run /init as init process
[ 26.190693] with arguments:
[ 26.193661] /init
[ 26.195933] with environment:
[ 26.199079] HOME=/
[ 26.201444] TERM=linux
[ 26.204152] BOOT_IMAGE=/boot/vmlinuz-5.13.0-rc2-next-20210519-1.g3455ff8-vanilla
[ 26.254154] BPF: type_id=35503 offset=178440 size=4
[ 26.259125] BPF:
[ 26.261054] BPF:Invalid offset
[ 26.264119] BPF:
[ 26.264119]
[ 26.267437] failed to validate module [efivarfs] BTF: -22
[ 26.316724] systemd[1]: systemd 246.13+suse.105.g14581e0120 running in system mode. (+PAM +AUDIT +SELINUX -IMA +APPARMOR -SMACK +SYSVINI
T +UTMP +LIBCRYPTSETUP +GCRYPT +GNUTLS +ACL +XZ +LZ4 +ZSTD +SECCOMP +BLKID +ELFUTILS +KMOD +IDN2 -IDN +PCRE2 default-hierarchy=unified)
[ 26.357990] systemd[1]: Detected architecture x86-64.
[ 26.363068] systemd[1]: Running in initial RAM disk.

Welcome to openSUSE Tumbleweed dracut-053+suse.93.g039ac07d-1.1 (Initramfs)!

[ 26.386019] systemd[1]: No hostname configured.
[ 26.390579] systemd[1]: Set hostname to <localhost>.
[ 26.519617] systemd[1]: /usr/lib/systemd/system/plymouth-start.service:15: Unit configured to use KillMode=none. This is unsafe, as it d
isables systemd's process lifecycle management for the service. Please update your service to use a safer KillMode=, such as 'mixed' or 'co
ntrol-group'. Support for KillMode=none is deprecated and will eventually be removed.
[ 26.559097] systemd[1]: Queued start job for default target Initrd Default Target.
[ 26.566795] systemd[1]: Reached target Local File Systems.
[ OK ] Reached target Local File Systems.
[ 26.589959] systemd[1]: Reached target Slices.
[ OK ] Reached target Slices.
[ 26.609968] systemd[1]: Reached target Swap.
[ OK ] Reached target Swap.
[ 26.629935] systemd[1]: Reached target Timers.
[ OK ] Reached target Timers.
[ 26.650105] systemd[1]: Listening on Journal Socket (/dev/log).
[ OK ] Listening on Journal Socket (/dev/log).
[ 26.670121] systemd[1]: Listening on Journal Socket.
[ OK ] Listening on Journal Socket.
[ 26.690164] systemd[1]: Listening on udev Control Socket.
[ OK ] Listening on udev Control Socket.
[ 26.710090] systemd[1]: Listening on udev Kernel Socket.
[ OK ] Listening on udev Kernel Socket.
[ 26.729944] systemd[1]: Reached target Sockets.
[ OK ] Reached target Sockets.
[ 26.754840] systemd[1]: Started Entropy Daemon based on the HAVEGE algorithm.
[ OK ] Started Entropy Daemon based on the HAVEGE algorithm.
[ 26.780471] systemd[1]: Starting Create list of static device nodes for the current kernel...
Starting Create list of st…odes for the current kernel...
[ 26.806030] systemd[1]: systemd-journald.service: unit configures an IP firewall, but the local system does not support BPF/cgroup firew
alling.
[ 26.818903] systemd[1]: (This warning is only shown for the first unit using IP firewalling.)
[ 26.829047] systemd[1]: Starting Journal Service...
Starting Journal Service...
[ 26.851528] systemd[1]: Starting Load Kernel Modules...
Starting Load Kernel Modules...
[ 26.865660] failed to validate module [scsi_dh_alua] BTF: -22
[ 26.873543] systemd[1]: Starting Setup Virtual Console...
Starting Setup Virtual Console...
[ 26.894776] systemd[1]: Finished Create list of static device nodes for the current kernel.
[ OK ] Finished Create list of st… nodes for the current kernel.
[ 26.922498] systemd[1]: Finished Setup Virtual Console.
[ OK ] Finished Setup Virt[ 26.930629] failed to validate module [scsi_dh_emc] BTF: -22
ual Console.
[ 26.951836] systemd[1]: Starting dracut ask for additional cmdline parameters...
Starting dracu[ 26.961107] systemd[1]: Starting Create Static Device Nodes in /dev...
t ask for additional cmdline parameters...
[ 26.972462] systemd[1]: Finished Create Static Device Nodes in /dev.
Starting Create Static Device Nodes in /dev...
[ OK ] Finished Create Static Device Nodes in /dev.
[ 27.002870] systemd[1]: Finished dracut ask for additional cmdline parameters.
[ OK ] Finished dracut ask[ 27.013660] failed to validate module [scsi_dh_rdac] BTF: -22
for additional cmdline parameters.
[ 27.036033] systemd[1]: Starting dracut cmdline hook...
Starting dracut cmdline hook...
[ 27.065506] systemd[1]: Started Journal Service.
[ OK ] Started Journal Service.
Starting Create Volatile Files and Directories...
[ OK ] Finished Create Volatile Files and Directories.
[ 27.113543] failed to validate module [dm_mod] BTF: -22
[ OK ] Finished dracut cmdline hook.
Starting dracut pre-udev hook...
[ 27.166641] failed to validate module [sg] BTF: -22
[ 27.204430] failed to validate module [raid6_pq] BTF: -22
[FAILED] Failed to start Load Kernel Modules.
See 'systemctl status systemd-modules-load.service' for details.
Starting Apply Kernel Variables...
[ OK ] Finished Apply Kernel Variables.
[ OK ] Finished dracut pre-udev hook.
Starting Rule-based Manage…for Device Events and Files...
[ OK ] Started Rule-based Manager for Device Events and Files.
Starting Coldplug All udev Devices...
[ 27.648960] failed to validate module [wmi] BTF: -22
[ OK ] Finished Coldplug All udev Devices.
[ 27.668587] failed to validate module [nvme_core] BTF: -22
[ 27.674538] failed to validate module [usbcore] BTF: -22
[ 27.680143] failed to validate module [cryptd] BTF: -22
[ 27.689045] failed to validate module [drm] BTF: -22
[ OK ] Reached target System Initialization.
Starting dracut initqueue hook...
[ OK ] Started Dispatch Password …ts to Console Directory Watch.
[ OK ] Reached target Paths.
[ OK ] Reached targ[ 27.767607] failed to validate module [wmi] BTF: -22
et Basic System.
[ 27.879070] failed to validate module [cryptd] BTF: -22
[ 27.885630] failed to validate module [nvme_core] BTF: -22
[ 27.931144] failed to validate module [cryptd] BTF: -22
[ 28.023351] failed to validate module [cryptd] BTF: -22
[ 28.028994] failed to validate module [crc32c_intel] BTF: -22
[ 28.099040] failed to validate module [cryptd] BTF: -22
[ 28.104698] failed to validate module [crc32_pclmul] BTF: -22
[ 28.139039] failed to validate module [cryptd] BTF: -22
[ 28.168507] failed to validate module [crct10dif_pclmul] BTF: -22
[ 28.207039] failed to validate module [cryptd] BTF: -22
[ 28.271097] failed to validate module [cryptd] BTF: -22
[ 28.315053] failed to validate module [cryptd] BTF: -22
[ 28.363026] failed to validate module [cryptd] BTF: -22
[ 28.419114] failed to validate module [cryptd] BTF: -22
[ 28.424769] failed to validate module [crc32c_intel] BTF: -22
[ 28.467243] failed to validate module [cryptd] BTF: -22
[ 28.496341] failed to validate module [crc32_pclmul] BTF: -22
[ 28.539048] failed to validate module [cryptd] BTF: -22
[ 28.568428] failed to validate module [crct10dif_pclmul] BTF: -22
[ 28.619016] failed to validate module [cryptd] BTF: -22
[ 28.687076] failed to validate module [cryptd] BTF: -22
[ 28.731026] failed to validate module [cryptd] BTF: -22
[ 28.771116] failed to validate module [cryptd] BTF: -22
[ 28.835097] failed to validate module [cryptd] BTF: -22
[ 28.883095] failed to validate module [cryptd] BTF: -22
[ 28.935043] failed to validate module [cryptd] BTF: -22
[ 28.940709] failed to validate module [crc32c_intel] BTF: -22
[ 28.986972] failed to validate module [cryptd] BTF: -22
[ 29.020403] failed to validate module [crc32_pclmul] BTF: -22
[ 29.104420] failed to validate module [crct10dif_pclmul] BTF: -22
...


Attachments:
(No filename) (9.14 kB)
vanilla (255.33 kB)
Download all attachments

2021-05-21 09:46:29

by Andrii Nakryiko

[permalink] [raw]
Subject: Re: BPF: failed module verification on linux-next

On Wed, May 19, 2021 at 7:19 AM Michal Suchánek <[email protected]> wrote:
>
> Hello,
>
> linux-next fails to boot for me:
>
> [ 0.000000] Linux version 5.13.0-rc2-next-20210519-1.g3455ff8-vanilla (geeko@buildhost) (gcc (SUSE Linux) 10.3.0, GNU ld (GNU Binutils;
> openSUSE Tumbleweed) 2.36.1.20210326-3) #1 SMP Wed May 19 10:05:10 UTC 2021 (3455ff8)
> [ 0.000000] Command line: BOOT_IMAGE=/boot/vmlinuz-5.13.0-rc2-next-20210519-1.g3455ff8-vanilla root=UUID=ec42c33e-a2c2-4c61-afcc-93e9527
> 8f687 plymouth.enable=0 resume=/dev/disk/by-uuid/f1fe4560-a801-4faf-a638-834c407027c7 mitigations=auto earlyprintk initcall_debug nomodeset
> earlycon ignore_loglevel console=ttyS0,115200
> ...
> [ 26.093364] calling tracing_set_default_clock+0x0/0x62 @ 1
> [ 26.098937] initcall tracing_set_default_clock+0x0/0x62 returned 0 after 0 usecs
> [ 26.106330] calling acpi_gpio_handle_deferred_request_irqs+0x0/0x7c @ 1
> [ 26.113033] initcall acpi_gpio_handle_deferred_request_irqs+0x0/0x7c returned 0 after 3 usecs
> [ 26.121559] calling clk_disable_unused+0x0/0x102 @ 1
> [ 26.126620] initcall clk_disable_unused+0x0/0x102 returned 0 after 0 usecs
> [ 26.133491] calling regulator_init_complete+0x0/0x25 @ 1
> [ 26.138890] initcall regulator_init_complete+0x0/0x25 returned 0 after 0 usecs
> [ 26.147816] Freeing unused decrypted memory: 2036K
> [ 26.153682] Freeing unused kernel image (initmem) memory: 2308K
> [ 26.165776] Write protecting the kernel read-only data: 26624k
> [ 26.173067] Freeing unused kernel image (text/rodata gap) memory: 2036K
> [ 26.180416] Freeing unused kernel image (rodata/data gap) memory: 1184K
> [ 26.187031] Run /init as init process
> [ 26.190693] with arguments:
> [ 26.193661] /init
> [ 26.195933] with environment:
> [ 26.199079] HOME=/
> [ 26.201444] TERM=linux
> [ 26.204152] BOOT_IMAGE=/boot/vmlinuz-5.13.0-rc2-next-20210519-1.g3455ff8-vanilla
> [ 26.254154] BPF: type_id=35503 offset=178440 size=4
> [ 26.259125] BPF:
> [ 26.261054] BPF:Invalid offset
> [ 26.264119] BPF:

It took me a while to reliably bisect this, but it clearly points to
this commit:

e481fac7d80b ("mm/page_alloc: convert per-cpu list protection to local_lock")

One commit before it, 676535512684 ("mm/page_alloc: split per cpu page
lists and zone stats -fix"), works just fine.

I'll have to spend more time debugging what exactly is happening, but
the immediate problem is two different definitions of numa_node
per-cpu variable. They both are at the same offset within
.data..percpu ELF section, they both have the same name, but one of
them is marked as static and another as global. And one is int
variable, while another is struct pagesets. I'll look some more
tomorrow, but adding Jiri and Arnaldo for visibility.

[110907] DATASEC '.data..percpu' size=178904 vlen=303
...
type_id=27753 offset=163976 size=4 (VAR 'numa_node')
type_id=27754 offset=163976 size=4 (VAR 'numa_node')

[27753] VAR 'numa_node' type_id=27556, linkage=static
[27754] VAR 'numa_node' type_id=20, linkage=global

[20] INT 'int' size=4 bits_offset=0 nr_bits=32 encoding=SIGNED

[27556] STRUCT 'pagesets' size=0 vlen=1
'lock' type_id=507 bits_offset=0

[506] STRUCT '(anon)' size=0 vlen=0
[507] TYPEDEF 'local_lock_t' type_id=506

So also something weird about those zero-sized struct pagesets and
local_lock_t inside it.

> [ 26.264119]
> [ 26.267437] failed to validate module [efivarfs] BTF: -22
> [ 26.316724] systemd[1]: systemd 246.13+suse.105.g14581e0120 running in system mode. (+PAM +AUDIT +SELINUX -IMA +APPARMOR -SMACK +SYSVINI
> T +UTMP +LIBCRYPTSETUP +GCRYPT +GNUTLS +ACL +XZ +LZ4 +ZSTD +SECCOMP +BLKID +ELFUTILS +KMOD +IDN2 -IDN +PCRE2 default-hierarchy=unified)
> [ 26.357990] systemd[1]: Detected architecture x86-64.
> [ 26.363068] systemd[1]: Running in initial RAM disk.
>

[...]

2021-05-22 19:46:52

by Hritik Vijay

[permalink] [raw]
Subject: Re: BPF: failed module verification on linux-next

On Thu, May 20, 2021 at 10:31:18PM -0700, Andrii Nakryiko wrote:
> It took me a while to reliably bisect this, but it clearly points to
> this commit:
>
> e481fac7d80b ("mm/page_alloc: convert per-cpu list protection to local_lock")
>
> One commit before it, 676535512684 ("mm/page_alloc: split per cpu page
> lists and zone stats -fix"), works just fine.

Thank you for pointing this out. I'm facing the same issue.
I've posted my logs in the following thread
https://lore.kernel.org/linux-next/[email protected]/T/#t

Hrtk

2021-05-23 23:49:20

by Hritik Vijay

[permalink] [raw]
Subject: Re: BPF: failed module verification on linux-next

On Thu, May 20, 2021 at 10:31:18PM -0700, Andrii Nakryiko wrote:
> It took me a while to reliably bisect this, but it clearly points to
> this commit:
>
> e481fac7d80b ("mm/page_alloc: convert per-cpu list protection to local_lock")
>
I tried compiling 5.13.0-rc1-next-20210514 and observed the same error.
5.13.0-rc1-next-20210513 boots fine for me though.
Could there be that there's something more to this ?
I could try to recompile 5.13.0-rc1-next-20210514 but I'm pretty sure
that it didn't boot.

Hrtk

2021-05-24 23:01:20

by Andrii Nakryiko

[permalink] [raw]
Subject: Re: BPF: failed module verification on linux-next

On Thu, May 20, 2021 at 10:31 PM Andrii Nakryiko
<[email protected]> wrote:
>
> On Wed, May 19, 2021 at 7:19 AM Michal Suchánek <[email protected]> wrote:
> >
> > Hello,
> >
> > linux-next fails to boot for me:
> >
> > [ 0.000000] Linux version 5.13.0-rc2-next-20210519-1.g3455ff8-vanilla (geeko@buildhost) (gcc (SUSE Linux) 10.3.0, GNU ld (GNU Binutils;
> > openSUSE Tumbleweed) 2.36.1.20210326-3) #1 SMP Wed May 19 10:05:10 UTC 2021 (3455ff8)
> > [ 0.000000] Command line: BOOT_IMAGE=/boot/vmlinuz-5.13.0-rc2-next-20210519-1.g3455ff8-vanilla root=UUID=ec42c33e-a2c2-4c61-afcc-93e9527
> > 8f687 plymouth.enable=0 resume=/dev/disk/by-uuid/f1fe4560-a801-4faf-a638-834c407027c7 mitigations=auto earlyprintk initcall_debug nomodeset
> > earlycon ignore_loglevel console=ttyS0,115200
> > ...
> > [ 26.093364] calling tracing_set_default_clock+0x0/0x62 @ 1
> > [ 26.098937] initcall tracing_set_default_clock+0x0/0x62 returned 0 after 0 usecs
> > [ 26.106330] calling acpi_gpio_handle_deferred_request_irqs+0x0/0x7c @ 1
> > [ 26.113033] initcall acpi_gpio_handle_deferred_request_irqs+0x0/0x7c returned 0 after 3 usecs
> > [ 26.121559] calling clk_disable_unused+0x0/0x102 @ 1
> > [ 26.126620] initcall clk_disable_unused+0x0/0x102 returned 0 after 0 usecs
> > [ 26.133491] calling regulator_init_complete+0x0/0x25 @ 1
> > [ 26.138890] initcall regulator_init_complete+0x0/0x25 returned 0 after 0 usecs
> > [ 26.147816] Freeing unused decrypted memory: 2036K
> > [ 26.153682] Freeing unused kernel image (initmem) memory: 2308K
> > [ 26.165776] Write protecting the kernel read-only data: 26624k
> > [ 26.173067] Freeing unused kernel image (text/rodata gap) memory: 2036K
> > [ 26.180416] Freeing unused kernel image (rodata/data gap) memory: 1184K
> > [ 26.187031] Run /init as init process
> > [ 26.190693] with arguments:
> > [ 26.193661] /init
> > [ 26.195933] with environment:
> > [ 26.199079] HOME=/
> > [ 26.201444] TERM=linux
> > [ 26.204152] BOOT_IMAGE=/boot/vmlinuz-5.13.0-rc2-next-20210519-1.g3455ff8-vanilla
> > [ 26.254154] BPF: type_id=35503 offset=178440 size=4
> > [ 26.259125] BPF:
> > [ 26.261054] BPF:Invalid offset
> > [ 26.264119] BPF:
>
> It took me a while to reliably bisect this, but it clearly points to
> this commit:
>
> e481fac7d80b ("mm/page_alloc: convert per-cpu list protection to local_lock")
>
> One commit before it, 676535512684 ("mm/page_alloc: split per cpu page
> lists and zone stats -fix"), works just fine.
>
> I'll have to spend more time debugging what exactly is happening, but
> the immediate problem is two different definitions of numa_node
> per-cpu variable. They both are at the same offset within
> .data..percpu ELF section, they both have the same name, but one of
> them is marked as static and another as global. And one is int
> variable, while another is struct pagesets. I'll look some more
> tomorrow, but adding Jiri and Arnaldo for visibility.
>
> [110907] DATASEC '.data..percpu' size=178904 vlen=303
> ...
> type_id=27753 offset=163976 size=4 (VAR 'numa_node')
> type_id=27754 offset=163976 size=4 (VAR 'numa_node')
>
> [27753] VAR 'numa_node' type_id=27556, linkage=static
> [27754] VAR 'numa_node' type_id=20, linkage=global
>
> [20] INT 'int' size=4 bits_offset=0 nr_bits=32 encoding=SIGNED
>
> [27556] STRUCT 'pagesets' size=0 vlen=1
> 'lock' type_id=507 bits_offset=0
>
> [506] STRUCT '(anon)' size=0 vlen=0
> [507] TYPEDEF 'local_lock_t' type_id=506
>
> So also something weird about those zero-sized struct pagesets and
> local_lock_t inside it.

Ok, so nothing weird about them. local_lock_t is designed to be
zero-sized unless CONFIG_DEBUG_LOCK_ALLOC is defined.

But such zero-sized per-CPU variables are confusing pahole during BTF
generation, as now two different variables "occupy" the same address.

Given this seems to be the first zero-sized per-CPU variable, I wonder
if it would be ok to make sure it's never zero-sized, while pahole
gets fixed and it's latest version gets widely packaged and
distributed.

Mel, what do you think about something like below? Or maybe you can
advise some better solution?

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 41b87d6f840c..6a1d7511cae9 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -124,6 +124,13 @@ static DEFINE_MUTEX(pcp_batch_high_lock);

struct pagesets {
local_lock_t lock;
+#if defined(CONFIG_DEBUG_INFO_BTF) && !defined(CONFIG_DEBUG_LOCK_ALLOC)
+ /* pahole 1.21 and earlier gets confused by zero-sized per-CPU
+ * variables and produces invalid BTF. So to accommodate earlier
+ * versions of pahole, ensure that sizeof(struct pagesets) is never 0.
+ */
+ char __filler;
+#endif
};
static DEFINE_PER_CPU(struct pagesets, pagesets) = {
.lock = INIT_LOCAL_LOCK(lock),

>
> > [ 26.264119]
> > [ 26.267437] failed to validate module [efivarfs] BTF: -22
> > [ 26.316724] systemd[1]: systemd 246.13+suse.105.g14581e0120 running in system mode. (+PAM +AUDIT +SELINUX -IMA +APPARMOR -SMACK +SYSVINI
> > T +UTMP +LIBCRYPTSETUP +GCRYPT +GNUTLS +ACL +XZ +LZ4 +ZSTD +SECCOMP +BLKID +ELFUTILS +KMOD +IDN2 -IDN +PCRE2 default-hierarchy=unified)
> > [ 26.357990] systemd[1]: Detected architecture x86-64.
> > [ 26.363068] systemd[1]: Running in initial RAM disk.
> >
>
> [...]

2021-05-24 23:47:37

by Andrii Nakryiko

[permalink] [raw]
Subject: Re: BPF: failed module verification on linux-next

On Mon, May 24, 2021 at 3:58 PM Andrii Nakryiko
<[email protected]> wrote:
>
> On Thu, May 20, 2021 at 10:31 PM Andrii Nakryiko
> <[email protected]> wrote:
> >
> > On Wed, May 19, 2021 at 7:19 AM Michal Suchánek <[email protected]> wrote:
> > >
> > > Hello,
> > >
> > > linux-next fails to boot for me:
> > >
> > > [ 0.000000] Linux version 5.13.0-rc2-next-20210519-1.g3455ff8-vanilla (geeko@buildhost) (gcc (SUSE Linux) 10.3.0, GNU ld (GNU Binutils;
> > > openSUSE Tumbleweed) 2.36.1.20210326-3) #1 SMP Wed May 19 10:05:10 UTC 2021 (3455ff8)
> > > [ 0.000000] Command line: BOOT_IMAGE=/boot/vmlinuz-5.13.0-rc2-next-20210519-1.g3455ff8-vanilla root=UUID=ec42c33e-a2c2-4c61-afcc-93e9527
> > > 8f687 plymouth.enable=0 resume=/dev/disk/by-uuid/f1fe4560-a801-4faf-a638-834c407027c7 mitigations=auto earlyprintk initcall_debug nomodeset
> > > earlycon ignore_loglevel console=ttyS0,115200
> > > ...
> > > [ 26.093364] calling tracing_set_default_clock+0x0/0x62 @ 1
> > > [ 26.098937] initcall tracing_set_default_clock+0x0/0x62 returned 0 after 0 usecs
> > > [ 26.106330] calling acpi_gpio_handle_deferred_request_irqs+0x0/0x7c @ 1
> > > [ 26.113033] initcall acpi_gpio_handle_deferred_request_irqs+0x0/0x7c returned 0 after 3 usecs
> > > [ 26.121559] calling clk_disable_unused+0x0/0x102 @ 1
> > > [ 26.126620] initcall clk_disable_unused+0x0/0x102 returned 0 after 0 usecs
> > > [ 26.133491] calling regulator_init_complete+0x0/0x25 @ 1
> > > [ 26.138890] initcall regulator_init_complete+0x0/0x25 returned 0 after 0 usecs
> > > [ 26.147816] Freeing unused decrypted memory: 2036K
> > > [ 26.153682] Freeing unused kernel image (initmem) memory: 2308K
> > > [ 26.165776] Write protecting the kernel read-only data: 26624k
> > > [ 26.173067] Freeing unused kernel image (text/rodata gap) memory: 2036K
> > > [ 26.180416] Freeing unused kernel image (rodata/data gap) memory: 1184K
> > > [ 26.187031] Run /init as init process
> > > [ 26.190693] with arguments:
> > > [ 26.193661] /init
> > > [ 26.195933] with environment:
> > > [ 26.199079] HOME=/
> > > [ 26.201444] TERM=linux
> > > [ 26.204152] BOOT_IMAGE=/boot/vmlinuz-5.13.0-rc2-next-20210519-1.g3455ff8-vanilla
> > > [ 26.254154] BPF: type_id=35503 offset=178440 size=4
> > > [ 26.259125] BPF:
> > > [ 26.261054] BPF:Invalid offset
> > > [ 26.264119] BPF:
> >
> > It took me a while to reliably bisect this, but it clearly points to
> > this commit:
> >
> > e481fac7d80b ("mm/page_alloc: convert per-cpu list protection to local_lock")
> >
> > One commit before it, 676535512684 ("mm/page_alloc: split per cpu page
> > lists and zone stats -fix"), works just fine.
> >
> > I'll have to spend more time debugging what exactly is happening, but
> > the immediate problem is two different definitions of numa_node
> > per-cpu variable. They both are at the same offset within
> > .data..percpu ELF section, they both have the same name, but one of
> > them is marked as static and another as global. And one is int
> > variable, while another is struct pagesets. I'll look some more
> > tomorrow, but adding Jiri and Arnaldo for visibility.
> >
> > [110907] DATASEC '.data..percpu' size=178904 vlen=303
> > ...
> > type_id=27753 offset=163976 size=4 (VAR 'numa_node')
> > type_id=27754 offset=163976 size=4 (VAR 'numa_node')
> >
> > [27753] VAR 'numa_node' type_id=27556, linkage=static
> > [27754] VAR 'numa_node' type_id=20, linkage=global
> >
> > [20] INT 'int' size=4 bits_offset=0 nr_bits=32 encoding=SIGNED
> >
> > [27556] STRUCT 'pagesets' size=0 vlen=1
> > 'lock' type_id=507 bits_offset=0
> >
> > [506] STRUCT '(anon)' size=0 vlen=0
> > [507] TYPEDEF 'local_lock_t' type_id=506
> >
> > So also something weird about those zero-sized struct pagesets and
> > local_lock_t inside it.
>
> Ok, so nothing weird about them. local_lock_t is designed to be
> zero-sized unless CONFIG_DEBUG_LOCK_ALLOC is defined.
>
> But such zero-sized per-CPU variables are confusing pahole during BTF
> generation, as now two different variables "occupy" the same address.

FWIW, here's the pahole fix (it tried to filter zero-sized per-CPU
vars, but not quite completely).

[0] https://lore.kernel.org/bpf/[email protected]/T/#u

>
> Given this seems to be the first zero-sized per-CPU variable, I wonder
> if it would be ok to make sure it's never zero-sized, while pahole
> gets fixed and it's latest version gets widely packaged and
> distributed.
>
> Mel, what do you think about something like below? Or maybe you can
> advise some better solution?
>
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 41b87d6f840c..6a1d7511cae9 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -124,6 +124,13 @@ static DEFINE_MUTEX(pcp_batch_high_lock);
>
> struct pagesets {
> local_lock_t lock;
> +#if defined(CONFIG_DEBUG_INFO_BTF) && !defined(CONFIG_DEBUG_LOCK_ALLOC)
> + /* pahole 1.21 and earlier gets confused by zero-sized per-CPU
> + * variables and produces invalid BTF. So to accommodate earlier
> + * versions of pahole, ensure that sizeof(struct pagesets) is never 0.
> + */
> + char __filler;
> +#endif
> };
> static DEFINE_PER_CPU(struct pagesets, pagesets) = {
> .lock = INIT_LOCAL_LOCK(lock),
>
> >
> > > [ 26.264119]
> > > [ 26.267437] failed to validate module [efivarfs] BTF: -22
> > > [ 26.316724] systemd[1]: systemd 246.13+suse.105.g14581e0120 running in system mode. (+PAM +AUDIT +SELINUX -IMA +APPARMOR -SMACK +SYSVINI
> > > T +UTMP +LIBCRYPTSETUP +GCRYPT +GNUTLS +ACL +XZ +LZ4 +ZSTD +SECCOMP +BLKID +ELFUTILS +KMOD +IDN2 -IDN +PCRE2 default-hierarchy=unified)
> > > [ 26.357990] systemd[1]: Detected architecture x86-64.
> > > [ 26.363068] systemd[1]: Running in initial RAM disk.
> > >
> >
> > [...]

2021-05-25 13:52:46

by Mel Gorman

[permalink] [raw]
Subject: Re: BPF: failed module verification on linux-next

On Mon, May 24, 2021 at 03:58:29PM -0700, Andrii Nakryiko wrote:
> > It took me a while to reliably bisect this, but it clearly points to
> > this commit:
> >
> > e481fac7d80b ("mm/page_alloc: convert per-cpu list protection to local_lock")
> >
> > One commit before it, 676535512684 ("mm/page_alloc: split per cpu page
> > lists and zone stats -fix"), works just fine.
> >
> > I'll have to spend more time debugging what exactly is happening, but
> > the immediate problem is two different definitions of numa_node
> > per-cpu variable. They both are at the same offset within
> > .data..percpu ELF section, they both have the same name, but one of
> > them is marked as static and another as global. And one is int
> > variable, while another is struct pagesets. I'll look some more
> > tomorrow, but adding Jiri and Arnaldo for visibility.
> >
> > [110907] DATASEC '.data..percpu' size=178904 vlen=303
> > ...
> > type_id=27753 offset=163976 size=4 (VAR 'numa_node')
> > type_id=27754 offset=163976 size=4 (VAR 'numa_node')
> >
> > [27753] VAR 'numa_node' type_id=27556, linkage=static
> > [27754] VAR 'numa_node' type_id=20, linkage=global
> >
> > [20] INT 'int' size=4 bits_offset=0 nr_bits=32 encoding=SIGNED
> >
> > [27556] STRUCT 'pagesets' size=0 vlen=1
> > 'lock' type_id=507 bits_offset=0
> >
> > [506] STRUCT '(anon)' size=0 vlen=0
> > [507] TYPEDEF 'local_lock_t' type_id=506
> >
> > So also something weird about those zero-sized struct pagesets and
> > local_lock_t inside it.
>
> Ok, so nothing weird about them. local_lock_t is designed to be
> zero-sized unless CONFIG_DEBUG_LOCK_ALLOC is defined.
>
> But such zero-sized per-CPU variables are confusing pahole during BTF
> generation, as now two different variables "occupy" the same address.
>
> Given this seems to be the first zero-sized per-CPU variable, I wonder
> if it would be ok to make sure it's never zero-sized, while pahole
> gets fixed and it's latest version gets widely packaged and
> distributed.
>
> Mel, what do you think about something like below? Or maybe you can
> advise some better solution?
>

Ouch, something like that may never go away. How about just this?

diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index 58426acf5983..dce2df33d823 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -338,6 +338,9 @@ config DEBUG_INFO_BTF
config PAHOLE_HAS_SPLIT_BTF
def_bool $(success, test `$(PAHOLE) --version | sed -E 's/v([0-9]+)\.([0-9]+)/\1\2/'` -ge "119")

+config PAHOLE_HAS_ZEROSIZE_PERCPU_SUPPORT
+ def_bool $(success, test `$(PAHOLE) --version | sed -E 's/v([0-9]+)\.([0-9]+)/\1\2/'` -ge "122")
+
config DEBUG_INFO_BTF_MODULES
def_bool y
depends on DEBUG_INFO_BTF && MODULES && PAHOLE_HAS_SPLIT_BTF
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 1599985e0ee1..cb1f84848c99 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -124,6 +124,17 @@ static DEFINE_MUTEX(pcp_batch_high_lock);

struct pagesets {
local_lock_t lock;
+#if defined(CONFIG_DEBUG_INFO_BTF) && \
+ !defined(CONFIG_DEBUG_LOCK_ALLOC) && \
+ !defined(CONFIG_PAHOLE_HAS_ZEROSIZE_PERCPU_SUPPORT)
+ /*
+ * pahole 1.21 and earlier gets confused by zero-sized per-CPU
+ * variables and produces invalid BTF. Ensure that
+ * sizeof(struct pagesets) != 0 for older versions of pahole.
+ */
+ char __pahole_hack;
+ #warning "pahole too old to support zero-sized struct pagesets"
+#endif
};
static DEFINE_PER_CPU(struct pagesets, pagesets) = {
.lock = INIT_LOCAL_LOCK(lock),
diff --git a/scripts/rust-version.sh b/scripts/rust-version.sh
old mode 100644
new mode 100755
--
Mel Gorman
SUSE Labs

2021-05-25 23:35:14

by Andrii Nakryiko

[permalink] [raw]
Subject: Re: BPF: failed module verification on linux-next

On Tue, May 25, 2021 at 6:51 AM Mel Gorman <[email protected]> wrote:
>
> On Mon, May 24, 2021 at 03:58:29PM -0700, Andrii Nakryiko wrote:
> > > It took me a while to reliably bisect this, but it clearly points to
> > > this commit:
> > >
> > > e481fac7d80b ("mm/page_alloc: convert per-cpu list protection to local_lock")
> > >
> > > One commit before it, 676535512684 ("mm/page_alloc: split per cpu page
> > > lists and zone stats -fix"), works just fine.
> > >
> > > I'll have to spend more time debugging what exactly is happening, but
> > > the immediate problem is two different definitions of numa_node
> > > per-cpu variable. They both are at the same offset within
> > > .data..percpu ELF section, they both have the same name, but one of
> > > them is marked as static and another as global. And one is int
> > > variable, while another is struct pagesets. I'll look some more
> > > tomorrow, but adding Jiri and Arnaldo for visibility.
> > >
> > > [110907] DATASEC '.data..percpu' size=178904 vlen=303
> > > ...
> > > type_id=27753 offset=163976 size=4 (VAR 'numa_node')
> > > type_id=27754 offset=163976 size=4 (VAR 'numa_node')
> > >
> > > [27753] VAR 'numa_node' type_id=27556, linkage=static
> > > [27754] VAR 'numa_node' type_id=20, linkage=global
> > >
> > > [20] INT 'int' size=4 bits_offset=0 nr_bits=32 encoding=SIGNED
> > >
> > > [27556] STRUCT 'pagesets' size=0 vlen=1
> > > 'lock' type_id=507 bits_offset=0
> > >
> > > [506] STRUCT '(anon)' size=0 vlen=0
> > > [507] TYPEDEF 'local_lock_t' type_id=506
> > >
> > > So also something weird about those zero-sized struct pagesets and
> > > local_lock_t inside it.
> >
> > Ok, so nothing weird about them. local_lock_t is designed to be
> > zero-sized unless CONFIG_DEBUG_LOCK_ALLOC is defined.
> >
> > But such zero-sized per-CPU variables are confusing pahole during BTF
> > generation, as now two different variables "occupy" the same address.
> >
> > Given this seems to be the first zero-sized per-CPU variable, I wonder
> > if it would be ok to make sure it's never zero-sized, while pahole
> > gets fixed and it's latest version gets widely packaged and
> > distributed.
> >
> > Mel, what do you think about something like below? Or maybe you can
> > advise some better solution?
> >
>
> Ouch, something like that may never go away. How about just this?

Yeah, that would work just fine, thanks! Would you like me to send a
formal patch or you'd like to do it?

>
> diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
> index 58426acf5983..dce2df33d823 100644
> --- a/lib/Kconfig.debug
> +++ b/lib/Kconfig.debug
> @@ -338,6 +338,9 @@ config DEBUG_INFO_BTF
> config PAHOLE_HAS_SPLIT_BTF
> def_bool $(success, test `$(PAHOLE) --version | sed -E 's/v([0-9]+)\.([0-9]+)/\1\2/'` -ge "119")
>
> +config PAHOLE_HAS_ZEROSIZE_PERCPU_SUPPORT
> + def_bool $(success, test `$(PAHOLE) --version | sed -E 's/v([0-9]+)\.([0-9]+)/\1\2/'` -ge "122")
> +
> config DEBUG_INFO_BTF_MODULES
> def_bool y
> depends on DEBUG_INFO_BTF && MODULES && PAHOLE_HAS_SPLIT_BTF
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 1599985e0ee1..cb1f84848c99 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -124,6 +124,17 @@ static DEFINE_MUTEX(pcp_batch_high_lock);
>
> struct pagesets {
> local_lock_t lock;
> +#if defined(CONFIG_DEBUG_INFO_BTF) && \
> + !defined(CONFIG_DEBUG_LOCK_ALLOC) && \
> + !defined(CONFIG_PAHOLE_HAS_ZEROSIZE_PERCPU_SUPPORT)
> + /*
> + * pahole 1.21 and earlier gets confused by zero-sized per-CPU
> + * variables and produces invalid BTF. Ensure that
> + * sizeof(struct pagesets) != 0 for older versions of pahole.
> + */
> + char __pahole_hack;
> + #warning "pahole too old to support zero-sized struct pagesets"
> +#endif
> };
> static DEFINE_PER_CPU(struct pagesets, pagesets) = {
> .lock = INIT_LOCAL_LOCK(lock),
> diff --git a/scripts/rust-version.sh b/scripts/rust-version.sh
> old mode 100644
> new mode 100755

Probably didn't intend to include this?

> --
> Mel Gorman
> SUSE Labs

2021-05-26 08:16:51

by Mel Gorman

[permalink] [raw]
Subject: Re: BPF: failed module verification on linux-next

> On Tue, May 25, 2021 at 6:51 AM Mel Gorman <[email protected]> wrote:
> >
> > On Mon, May 24, 2021 at 03:58:29PM -0700, Andrii Nakryiko wrote:
> > > > It took me a while to reliably bisect this, but it clearly points to
> > > > this commit:
> > > >
> > > > e481fac7d80b ("mm/page_alloc: convert per-cpu list protection to local_lock")
> > > > <SNIP>
> > >
> > > Ok, so nothing weird about them. local_lock_t is designed to be
> > > zero-sized unless CONFIG_DEBUG_LOCK_ALLOC is defined.
> > >
> > > But such zero-sized per-CPU variables are confusing pahole during BTF
> > > generation, as now two different variables "occupy" the same address.
> > >
> > > Given this seems to be the first zero-sized per-CPU variable, I wonder
> > > if it would be ok to make sure it's never zero-sized, while pahole
> > > gets fixed and it's latest version gets widely packaged and
> > > distributed.
> > >
> > > Mel, what do you think about something like below? Or maybe you can
> > > advise some better solution?
> > >
> >
> > Ouch, something like that may never go away. How about just this?
>
> Yeah, that would work just fine, thanks! Would you like me to send a
> formal patch or you'd like to do it?
>

Thanks Andrii for bisecting and debugging this, I used your analysis in
the changelog which I hope is ok. For future mailing list searches based
on the same bug, I sent a formal patch

https://lore.kernel.org/r/[email protected]

> > diff --git a/scripts/rust-version.sh b/scripts/rust-version.sh
> > old mode 100644
> > new mode 100755
>
> Probably didn't intend to include this?
>

That was an oversight when applying Andrew's mmotm tree which missed
setting the permissions on rust-version.sh and broke build.

--
Mel Gorman
SUSE Labs