2022-09-22 12:44:00

by Petr Štetiar

[permalink] [raw]
Subject: aarch64 5.15.68 regression in topology/thread_siblings (huge file size and no content)

Hi,

we've got a recent bug report[1], that lscpu segfaults on aarch64 board running
5.15.y kernel. It is working fine on 5.10.y kernel.

I've tracked it down[2] to the issue with `topology/thread_siblings` which
apart from very strange file size returns empty content. I assume, that it's
somehow related to the changes done in commit bb9ec13d156e ("topology: use
bin_attribute to break the size limitation of cpumap ABI"), but I didn't tried
to revert it yet to verify it.

Kernel 5.15.68:

root@OpenWrt:/# uname -a
Linux OpenWrt 5.15.68 #0 SMP Wed Sep 21 05:54:21 2022 aarch64 GNU/Linux

root@OpenWrt:/# find /sys -name thread_siblings -exec ls -al {} \;
-r--r--r-- 1 root root 18446744073709551615 Sep 22 08:37 /sys/devices/system/cpu/cpu1/topology/thread_siblings
-r--r--r-- 1 root root 18446744073709551615 Sep 22 08:37 /sys/devices/system/cpu/cpu0/topology/thread_siblings

root@OpenWrt:/# find /sys -name thread_siblings -exec cat {} \;
root@OpenWrt:/#

Kernel 5.10.138:

root@OpenWrt:/# uname -a
Linux OpenWrt 5.10.138 #0 SMP Sat Sep 3 02:55:34 2022 aarch64 GNU/Linux

root@OpenWrt:/# find /sys -name thread_siblings -exec cat {} \;
2
1

root@OpenWrt:/# find /sys -name thread_siblings -exec ls -al {} \;
-r--r--r-- 1 root root 4096 Sep 22 11:12 /sys/devices/system/cpu/cpu1/topology/thread_siblings
-r--r--r-- 1 root root 4096 Sep 22 11:12 /sys/devices/system/cpu/cpu0/topology/thread_siblings


1. https://github.com/openwrt/openwrt/issues/10737
2. https://github.com/util-linux/util-linux/pull/1821


Cheers,

Petr


2022-09-22 13:14:09

by Phil Auld

[permalink] [raw]
Subject: Re: aarch64 5.15.68 regression in topology/thread_siblings (huge file size and no content)

On Thu, Sep 22, 2022 at 01:32:17PM +0200 Petr Štetiar wrote:
> Hi,
>
> we've got a recent bug report[1], that lscpu segfaults on aarch64 board running
> 5.15.y kernel. It is working fine on 5.10.y kernel.
>
> I've tracked it down[2] to the issue with `topology/thread_siblings` which
> apart from very strange file size returns empty content. I assume, that it's
> somehow related to the changes done in commit bb9ec13d156e ("topology: use
> bin_attribute to break the size limitation of cpumap ABI"), but I didn't tried
> to revert it yet to verify it.
>

This is actually due to a fix for that since returning 0 size breaks
things as well.

7ee951acd31a drivers/base: fix userspace break from using bin_attributes for cpumap and cpulist

The fix for small number of cpus as you have is now in Greg's driver core tree

d7f06bdd6ee8 drivers/base: Fix unsigned comparison to -1 in CPUMAP_FILE_MAX_BYTES

and should work it's way back to stable trees soon.


Cheers,
Phil


> Kernel 5.15.68:
>
> root@OpenWrt:/# uname -a
> Linux OpenWrt 5.15.68 #0 SMP Wed Sep 21 05:54:21 2022 aarch64 GNU/Linux
>
> root@OpenWrt:/# find /sys -name thread_siblings -exec ls -al {} \;
> -r--r--r-- 1 root root 18446744073709551615 Sep 22 08:37 /sys/devices/system/cpu/cpu1/topology/thread_siblings
> -r--r--r-- 1 root root 18446744073709551615 Sep 22 08:37 /sys/devices/system/cpu/cpu0/topology/thread_siblings
>
> root@OpenWrt:/# find /sys -name thread_siblings -exec cat {} \;
> root@OpenWrt:/#
>
> Kernel 5.10.138:
>
> root@OpenWrt:/# uname -a
> Linux OpenWrt 5.10.138 #0 SMP Sat Sep 3 02:55:34 2022 aarch64 GNU/Linux
>
> root@OpenWrt:/# find /sys -name thread_siblings -exec cat {} \;
> 2
> 1
>
> root@OpenWrt:/# find /sys -name thread_siblings -exec ls -al {} \;
> -r--r--r-- 1 root root 4096 Sep 22 11:12 /sys/devices/system/cpu/cpu1/topology/thread_siblings
> -r--r--r-- 1 root root 4096 Sep 22 11:12 /sys/devices/system/cpu/cpu0/topology/thread_siblings
>
>
> 1. https://github.com/openwrt/openwrt/issues/10737
> 2. https://github.com/util-linux/util-linux/pull/1821
>
>
> Cheers,
>
> Petr
>

--

2022-09-22 13:38:21

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: aarch64 5.15.68 regression in topology/thread_siblings (huge file size and no content)

On Thu, Sep 22, 2022 at 08:32:10AM -0400, Phil Auld wrote:
> On Thu, Sep 22, 2022 at 01:32:17PM +0200 Petr Štetiar wrote:
> > Hi,
> >
> > we've got a recent bug report[1], that lscpu segfaults on aarch64 board running
> > 5.15.y kernel. It is working fine on 5.10.y kernel.
> >
> > I've tracked it down[2] to the issue with `topology/thread_siblings` which
> > apart from very strange file size returns empty content. I assume, that it's
> > somehow related to the changes done in commit bb9ec13d156e ("topology: use
> > bin_attribute to break the size limitation of cpumap ABI"), but I didn't tried
> > to revert it yet to verify it.
> >
>
> This is actually due to a fix for that since returning 0 size breaks
> things as well.
>
> 7ee951acd31a drivers/base: fix userspace break from using bin_attributes for cpumap and cpulist
>
> The fix for small number of cpus as you have is now in Greg's driver core tree
>
> d7f06bdd6ee8 drivers/base: Fix unsigned comparison to -1 in CPUMAP_FILE_MAX_BYTES
>
> and should work it's way back to stable trees soon.

That should fix up the file size issue.

The main problem being reported here is:

> > Kernel 5.15.68:
> >
> > root@OpenWrt:/# uname -a
> > Linux OpenWrt 5.15.68 #0 SMP Wed Sep 21 05:54:21 2022 aarch64 GNU/Linux
> >
> > root@OpenWrt:/# find /sys -name thread_siblings -exec ls -al {} \;
> > -r--r--r-- 1 root root 18446744073709551615 Sep 22 08:37 /sys/devices/system/cpu/cpu1/topology/thread_siblings
> > -r--r--r-- 1 root root 18446744073709551615 Sep 22 08:37 /sys/devices/system/cpu/cpu0/topology/thread_siblings
> >
> > root@OpenWrt:/# find /sys -name thread_siblings -exec cat {} \;
> > root@OpenWrt:/#

Nothing in the file in 5.15, yet 5.10:

> >
> > Kernel 5.10.138:
> >
> > root@OpenWrt:/# uname -a
> > Linux OpenWrt 5.10.138 #0 SMP Sat Sep 3 02:55:34 2022 aarch64 GNU/Linux
> >
> > root@OpenWrt:/# find /sys -name thread_siblings -exec cat {} \;
> > 2
> > 1

Has data in the files.

What caused that change?

thanks,

greg k-h

2022-09-22 14:02:55

by Phil Auld

[permalink] [raw]
Subject: Re: aarch64 5.15.68 regression in topology/thread_siblings (huge file size and no content)

On Thu, Sep 22, 2022 at 02:40:00PM +0200 Greg Kroah-Hartman wrote:
> On Thu, Sep 22, 2022 at 08:32:10AM -0400, Phil Auld wrote:
> > On Thu, Sep 22, 2022 at 01:32:17PM +0200 Petr Štetiar wrote:
> > > Hi,
> > >
> > > we've got a recent bug report[1], that lscpu segfaults on aarch64 board running
> > > 5.15.y kernel. It is working fine on 5.10.y kernel.
> > >
> > > I've tracked it down[2] to the issue with `topology/thread_siblings` which
> > > apart from very strange file size returns empty content. I assume, that it's
> > > somehow related to the changes done in commit bb9ec13d156e ("topology: use
> > > bin_attribute to break the size limitation of cpumap ABI"), but I didn't tried
> > > to revert it yet to verify it.
> > >
> >
> > This is actually due to a fix for that since returning 0 size breaks
> > things as well.
> >
> > 7ee951acd31a drivers/base: fix userspace break from using bin_attributes for cpumap and cpulist
> >
> > The fix for small number of cpus as you have is now in Greg's driver core tree
> >
> > d7f06bdd6ee8 drivers/base: Fix unsigned comparison to -1 in CPUMAP_FILE_MAX_BYTES
> >
> > and should work it's way back to stable trees soon.
>
> That should fix up the file size issue.
>
> The main problem being reported here is:
>
> > > Kernel 5.15.68:
> > >
> > > root@OpenWrt:/# uname -a
> > > Linux OpenWrt 5.15.68 #0 SMP Wed Sep 21 05:54:21 2022 aarch64 GNU/Linux
> > >
> > > root@OpenWrt:/# find /sys -name thread_siblings -exec ls -al {} \;
> > > -r--r--r-- 1 root root 18446744073709551615 Sep 22 08:37 /sys/devices/system/cpu/cpu1/topology/thread_siblings
> > > -r--r--r-- 1 root root 18446744073709551615 Sep 22 08:37 /sys/devices/system/cpu/cpu0/topology/thread_siblings
> > >
> > > root@OpenWrt:/# find /sys -name thread_siblings -exec cat {} \;
> > > root@OpenWrt:/#
>
> Nothing in the file in 5.15, yet 5.10:
>
> > >
> > > Kernel 5.10.138:
> > >
> > > root@OpenWrt:/# uname -a
> > > Linux OpenWrt 5.10.138 #0 SMP Sat Sep 3 02:55:34 2022 aarch64 GNU/Linux
> > >
> > > root@OpenWrt:/# find /sys -name thread_siblings -exec cat {} \;
> > > 2
> > > 1
>
> Has data in the files.
>

Good point. My eyes latched on to that huge file size for some reason ;)

> What caused that change?

I've seen the size cause problems for tools. Are we sure that it's the empty file and not
the size causing issues? Maybe something is treating that as signed again for a count of
-1 bytes (which seems like it would be a bug anyway)?


Cheers,
Phil

>
> thanks,
>
> greg k-h
>

--

2022-09-22 14:28:59

by Petr Štetiar

[permalink] [raw]
Subject: Re: aarch64 5.15.68 regression in topology/thread_siblings (huge file size and no content)

Phil Auld <[email protected]> [2022-09-22 09:18:47]:

Hi,

> I've seen the size cause problems for tools. Are we sure that it's the empty file and not
> the size causing issues? Maybe something is treating that as signed again for a count of
> -1 bytes (which seems like it would be a bug anyway)?

root@OpenWrt:/# strace cat /sys/devices/system/cpu/cpu1/topology/thread_siblings
...snip...
openat(AT_FDCWD, "/sys/devices/system/cpu/cpu1/topology/thread_siblings", O_RDONLY|O_LARGEFILE) = 3
sendfile(1, 3, NULL, 16777216) = 0

root@OpenWrt:/# strace md5sum /sys/devices/system/cpu/cpu1/topology/thread_sibli
...snip...
openat(AT_FDCWD, "/sys/devices/system/cpu/cpu1/topology/thread_siblings", O_RDONLY|O_LARGEFILE) = 3
read(3, "", 4096) = 0

root@OpenWrt:/# strace head /sys/devices/system/cpu/cpu1/topology/thread_siblings
...snip...
openat(AT_FDCWD, "/sys/devices/system/cpu/cpu1/topology/thread_siblings", O_RDONLY|O_LARGEFILE) = 3
read(3, "", 1024) = 0

Cheers,

Petr

2022-09-22 17:51:01

by Phil Auld

[permalink] [raw]
Subject: Re: aarch64 5.15.68 regression in topology/thread_siblings (huge file size and no content)

On Thu, Sep 22, 2022 at 04:05:04PM +0200 Petr Štetiar wrote:
> Phil Auld <[email protected]> [2022-09-22 09:18:47]:
>
> Hi,
>
> > I've seen the size cause problems for tools. Are we sure that it's the empty file and not
> > the size causing issues? Maybe something is treating that as signed again for a count of
> > -1 bytes (which seems like it would be a bug anyway)?
>
> root@OpenWrt:/# strace cat /sys/devices/system/cpu/cpu1/topology/thread_siblings
> ...snip...
> openat(AT_FDCWD, "/sys/devices/system/cpu/cpu1/topology/thread_siblings", O_RDONLY|O_LARGEFILE) = 3
> sendfile(1, 3, NULL, 16777216) = 0
>
> root@OpenWrt:/# strace md5sum /sys/devices/system/cpu/cpu1/topology/thread_sibli
> ...snip...
> openat(AT_FDCWD, "/sys/devices/system/cpu/cpu1/topology/thread_siblings", O_RDONLY|O_LARGEFILE) = 3
> read(3, "", 4096) = 0
>
> root@OpenWrt:/# strace head /sys/devices/system/cpu/cpu1/topology/thread_siblings
> ...snip...
> openat(AT_FDCWD, "/sys/devices/system/cpu/cpu1/topology/thread_siblings", O_RDONLY|O_LARGEFILE) = 3
> read(3, "", 1024) = 0
>

I tried this with the latest upstream (which doesn't yet have the fix
for the size issue) and got the same results.

Then I applied the fix and the problem went away:

6.0.0-rc6.nr_cpus2+
# find /sys -name thread_siblings -exec cat \{\} \;
2
1

Cheers,
Phil

> Cheers,
>
> Petr
>

--

2022-09-22 20:47:12

by Petr Štetiar

[permalink] [raw]
Subject: Re: aarch64 5.15.68 regression in topology/thread_siblings (huge file size and no content)

Phil Auld <[email protected]> [2022-09-22 13:18:12]:

> Then I applied the fix and the problem went away:

I've just tried the same aarch64 and I can confirm, that the
patch fixes the issue.

Cheers,

Petr

2022-09-23 09:59:22

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: aarch64 5.15.68 regression in topology/thread_siblings (huge file size and no content)

On Thu, Sep 22, 2022 at 10:05:06PM +0200, Petr Štetiar wrote:
> Phil Auld <[email protected]> [2022-09-22 13:18:12]:
>
> > Then I applied the fix and the problem went away:
>
> I've just tried the same aarch64 and I can confirm, that the
> patch fixes the issue.

Wow, that's odd that the file size matters here.

Ok, I'll send this to Linus in a few hours, thanks.

greg k-h

2022-09-23 13:56:18

by Phil Auld

[permalink] [raw]
Subject: Re: aarch64 5.15.68 regression in topology/thread_siblings (huge file size and no content)

On Fri, Sep 23, 2022 at 11:16:55AM +0200 Greg Kroah-Hartman wrote:
> On Thu, Sep 22, 2022 at 10:05:06PM +0200, Petr Štetiar wrote:
> > Phil Auld <[email protected]> [2022-09-22 13:18:12]:
> >
> > > Then I applied the fix and the problem went away:
> >
> > I've just tried the same aarch64 and I can confirm, that the
> > patch fixes the issue.
>
> Wow, that's odd that the file size matters here.
>

Yeah, I looked through the code some but nothing jumped out where
that unsigned -1 could cause a problem (like count + 1 wrapping
to 0 or something).

> Ok, I'll send this to Linus in a few hours, thanks.

Thanks!


Cheers,
Phil

>
> greg k-h
>

--