2023-10-19 15:28:22

by Naresh Kamboju

[permalink] [raw]
Subject: re: autofs: add autofs_parse_fd()

The qemu-x86_64 and x86_64 booting with 64bit kernel and 32bit rootfs we call
it as compat mode boot testing. Recently it started to failed to get login
prompt.

We have not seen any kernel crash logs.

Anders, bisection is pointing to first bad commit,
546694b8f658 autofs: add autofs_parse_fd()

Reported-by: Linux Kernel Functional Testing <[email protected]>
Reported-by: Anders Roxell <[email protected]>

steps to reproduce:
----------------
# To install tuxrun to your home directory at ~/.local/bin:
# pip3 install -U --user tuxrun==0.49.2
#
# Or install a deb/rpm depending on the running distribution
# See https://tuxmake.org/install-deb/ or
# https://tuxmake.org/install-rpm/
#
# See https://tuxrun.org/ for complete documentation.
#

tuxrun --runtime podman --device qemu-x86_64 --boot-args rw --kernel
https://storage.tuxsuite.com/public/linaro/lkft/builds/2WyQyyM0OvXnnbI0d84HL0v1J56/bzImage
--modules https://storage.tuxsuite.com/public/linaro/lkft/builds/2WyQyyM0OvXnnbI0d84HL0v1J56/modules.tar.xz
--rootfs https://storage.tuxboot.com/debian/bookworm/i386/rootfs.ext4.xz
--parameters SKIPFILE=skipfile-lkft.yaml --parameters SHARD_NUMBER=10
--parameters SHARD_INDEX=3 --image
docker.io/linaro/tuxrun-dispatcher:v0.49.2 --tests ltp-cve --timeouts
boot=15 --overlay
https://storage.tuxboot.com/overlays/debian/bookworm/i386/ltp/20230929/ltp.tar.xz


Please find related links to test and results comparison.

Links:
- https://qa-reports.linaro.org/lkft/linux-next-master/build/next-20231019/testrun/20695093/suite/boot/test/gcc-13-lkftconfig-compat/history/?page=1
- https://qa-reports.linaro.org/lkft/linux-next-master/build/next-20230926/testrun/20125035/suite/boot/test/gcc-13-lkftconfig-compat/details/
- https://tuxapi.tuxsuite.com/v1/groups/linaro/projects/lkft/tests/2WyR2MGbyTalC8rGugHIZXPMldC/reproducer
- https://tuxapi.tuxsuite.com/v1/groups/linaro/projects/lkft/tests/2WyR2MGbyTalC8rGugHIZXPMldC
--
Linaro LKFT
https://lkft.linaro.org


2023-10-20 06:37:21

by Arnd Bergmann

[permalink] [raw]
Subject: Re: autofs: add autofs_parse_fd()

On Thu, Oct 19, 2023, at 17:27, Naresh Kamboju wrote:
> The qemu-x86_64 and x86_64 booting with 64bit kernel and 32bit rootfs we call
> it as compat mode boot testing. Recently it started to failed to get login
> prompt.
>
> We have not seen any kernel crash logs.
>
> Anders, bisection is pointing to first bad commit,
> 546694b8f658 autofs: add autofs_parse_fd()
>
> Reported-by: Linux Kernel Functional Testing <[email protected]>
> Reported-by: Anders Roxell <[email protected]>

I tried to find something in that commit that would be different
in compat mode, but don't see anything at all -- this appears
to be just a simple refactoring of the code, unlike the commits
that immediately follow it and that do change the mount
interface.

Unfortunately this makes it impossible to just revert the commit
on top of linux-next. Can you double-check your bisection by
testing 546694b8f658 and the commit before it again?

What are the exact mount options you pass to autofs in your fstab?

Arnd

2023-10-20 07:48:48

by Naresh Kamboju

[permalink] [raw]
Subject: Re: autofs: add autofs_parse_fd()

On Fri, 20 Oct 2023 at 12:07, Arnd Bergmann <[email protected]> wrote:
>
> On Thu, Oct 19, 2023, at 17:27, Naresh Kamboju wrote:
> > The qemu-x86_64 and x86_64 booting with 64bit kernel and 32bit rootfs we call
> > it as compat mode boot testing. Recently it started to failed to get login
> > prompt.
> >
> > We have not seen any kernel crash logs.
> >
> > Anders, bisection is pointing to first bad commit,
> > 546694b8f658 autofs: add autofs_parse_fd()
> >
> > Reported-by: Linux Kernel Functional Testing <[email protected]>
> > Reported-by: Anders Roxell <[email protected]>
>
> I tried to find something in that commit that would be different
> in compat mode, but don't see anything at all -- this appears
> to be just a simple refactoring of the code, unlike the commits
> that immediately follow it and that do change the mount
> interface.
>
> Unfortunately this makes it impossible to just revert the commit
> on top of linux-next. Can you double-check your bisection by
> testing 546694b8f658 and the commit before it again?

I will try your suggested ways.

Is this information helpful ?
Linux-next the regression started happening from next-20230925.

GOOD: next-20230925
BAD: next-20230926

$ git log --oneline next-20230925..next-20230926 -- fs/autofs/
dede367149c4 autofs: fix protocol sub version setting
e6ec453bd0f0 autofs: convert autofs to use the new mount api
1f50012d9c63 autofs: validate protocol version
9b2731666d1d autofs: refactor parse_options()
7efd93ea790e autofs: reformat 0pt enum declaration
a7467430b4de autofs: refactor super block info init
546694b8f658 autofs: add autofs_parse_fd()
bc69fdde0ae1 autofs: refactor autofs_prepare_pipe()

> What are the exact mount options you pass to autofs in your fstab?

mount output shows like this,
systemd-1 on /proc/sys/fs/binfmt_misc type autofs
(rw,relatime,fd=30,pgrp=1,timeout=0,minproto=5,maxproto=5,direct,pipe_ino=1421)

More information:
------------
+ mkdir -p /scratch
+ mount /dev/disk/by-id/ata-TOSHIBA_MG03ACA100_37O9KGL0F /scratch
[ 38.217507] EXT4-fs (sda): mounted filesystem
13d24d11-02a0-4cc0-842e-d339637e2564 r/w with ordered data mode. Quota
mode: none.
+ echo mounted
mounted
+ df -h
Filesystem
Size Used Avail Use% Mounted on
10.66.16.116:/var/lib/lava/dispatcher/tmp/6951354/extract-nfsrootfs-fsw9ebbf
559G 72G 487G 13% /
devtmpfs
7.8G 0 7.8G 0% /dev
tmpfs
7.8G 0 7.8G 0% /dev/shm
tmpfs
3.2G 640K 3.2G 1% /run
tmpfs
5.0M 0 5.0M 0% /run/lock
tmpfs
1.6G 0 1.6G 0% /run/user/0
/dev/sda
916G 28K 870G 1% /scratch
+ mount
10.66.16.116:/var/lib/lava/dispatcher/tmp/6951354/extract-nfsrootfs-fsw9ebbf
on / type nfs (rw,relatime,vers=3,rsize=4096,wsize=4096,namlen=255,hard,nolock,proto=tcp,timeo=600,retrans=2,sec=sys,mountaddr=10.66.16.116,mountvers=3,mountproto=tcp,local_lock=all,addr=10.66.16.116)
devtmpfs on /dev type devtmpfs
(rw,relatime,size=8165164k,nr_inodes=2041291,mode=755)
proc on /proc type proc (rw,nosuid,nodev,noexec,relatime)
sysfs on /sys type sysfs (rw,nosuid,nodev,noexec,relatime)
selinuxfs on /sys/fs/selinux type selinuxfs (rw,nosuid,noexec,relatime)
tmpfs on /dev/shm type tmpfs (rw,nosuid,nodev)
devpts on /dev/pts type devpts
(rw,nosuid,noexec,relatime,gid=5,mode=620,ptmxmode=000)
tmpfs on /run type tmpfs
(rw,nosuid,nodev,size=3266844k,nr_inodes=819200,mode=755)
tmpfs on /run/lock type tmpfs (rw,nosuid,nodev,noexec,relatime,size=5120k)
cgroup2 on /sys/fs/cgroup type cgroup2
(rw,nosuid,nodev,noexec,relatime,nsdelegate,memory_recursiveprot)
systemd-1 on /proc/sys/fs/binfmt_misc type autofs
(rw,relatime,fd=30,pgrp=1,timeout=0,minproto=5,maxproto=5,direct,pipe_ino=1421)
hugetlbfs on /dev/hugepages type hugetlbfs (rw,relatime,pagesize=2M)
mqueue on /dev/mqueue type mqueue (rw,nosuid,nodev,noexec,relatime)
debugfs on /sys/kernel/debug type debugfs (rw,nosuid,nodev,noexec,relatime)
tracefs on /sys/kernel/tracing type tracefs (rw,nosuid,nodev,noexec,relatime)
ramfs on /run/credentials/systemd-sysctl.service type ramfs
(ro,nosuid,nodev,noexec,relatime,mode=700)
ramfs on /run/credentials/systemd-sysusers.service type ramfs
(ro,nosuid,nodev,noexec,relatime,mode=700)
ramfs on /run/credentials/systemd-tmpfiles-setup-dev.service type
ramfs (ro,nosuid,nodev,noexec,relatime,mode=700)
ramfs on /run/credentials/systemd-tmpfiles-setup.service type ramfs
(ro,nosuid,nodev,noexec,relatime,mode=700)
binfmt_misc on /proc/sys/fs/binfmt_misc type binfmt_misc
(rw,nosuid,nodev,noexec,relatime)
tmpfs on /run/user/0 type tmpfs
(rw,nosuid,nodev,relatime,size=1633420k,nr_inodes=408355,mode=700)
/dev/sda on /scratch type ext4 (rw,relatime)


Here is the bisect log for your reference,

# bad: [e3b18f7200f45d66f7141136c25554ac1e82009b] Add linux-next
specific files for 20231013
# good: [d3b4075b173f033387b614297bb4d998cf22c8bd] drm/msm/dp: use
correct lifetime device for devm_drm_bridge_add
git bisect start 'next-20231013' 'd3b4075b173f033387b614297bb4d998cf22c8bd'
# bad: [9949a257fbbef4bf21f77649d4585b6ba0d0abae] Merge branch
'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux.git
git bisect bad 9949a257fbbef4bf21f77649d4585b6ba0d0abae
# good: [39f9dc8a2d8e789ea31e343070a3b51c3f7ae3a2] Merge branch
'xtensa-for-next' of git://github.com/jcmvbkbc/linux-xtensa.git
git bisect good 39f9dc8a2d8e789ea31e343070a3b51c3f7ae3a2
# good: [4b32d7970b9865ad711a65094609a8c0ec9867a8] bcachefs: Change
journal_io.c assertion to error message
git bisect good 4b32d7970b9865ad711a65094609a8c0ec9867a8
# good: [79592709a731af69f33613490a7d88168fd52efb] bcachefs: Improve
key_visible_in_snapshot()
git bisect good 79592709a731af69f33613490a7d88168fd52efb
# good: [7ea2bcad6a0d5d15801abc318738d7eec3164d0a] Merge branch
'linux-next' of git://git.linux-nfs.org/projects/anna/linux-nfs.git
git bisect good 7ea2bcad6a0d5d15801abc318738d7eec3164d0a
# bad: [be636e0fa5fdb395366fd73a078e959d2587c476] Merge branch
'vfs.super' into vfs.all
git bisect bad be636e0fa5fdb395366fd73a078e959d2587c476
# good: [fea0e8fc7829dc85f82c8a1a8249630f6fb85553] fs: rename inode
i_atime and i_mtime fields
git bisect good fea0e8fc7829dc85f82c8a1a8249630f6fb85553
# bad: [18fa6d4d5ed5e6d219aeab9b92f602fa60020d95] Merge branch
'vfs.iov_iter' into vfs.all
git bisect bad 18fa6d4d5ed5e6d219aeab9b92f602fa60020d95
# good: [13f8510ba3215d2fb2ee7d1c64c0d51827ac28bd] ovl: rely on SB_I_NOUMASK
git bisect good 13f8510ba3215d2fb2ee7d1c64c0d51827ac28bd
# good: [b5f0e20f444cd150121e0ce912ebd3f2dabd12bc] iov_iter, net: Move
hash_and_copy_to_iter() to net/
git bisect good b5f0e20f444cd150121e0ce912ebd3f2dabd12bc
# bad: [1f50012d9c63c690f25956239bd25d10236405f8] autofs: validate
protocol version
git bisect bad 1f50012d9c63c690f25956239bd25d10236405f8
# bad: [a7467430b4de0985b7d1de8f1e50f8dd47eb6c4a] autofs: refactor
super block info init
git bisect bad a7467430b4de0985b7d1de8f1e50f8dd47eb6c4a
# bad: [546694b8f65807427a0104154abd8cdc633b36e3] autofs: add autofs_parse_fd()
git bisect bad 546694b8f65807427a0104154abd8cdc633b36e3
# good: [bc69fdde0ae1aff595590d802b6ef39114f2b260] autofs: refactor
autofs_prepare_pipe()
git bisect good bc69fdde0ae1aff595590d802b6ef39114f2b260
# first bad commit: [546694b8f65807427a0104154abd8cdc633b36e3] autofs:
add autofs_parse_fd()

- Naresh

2023-10-20 09:03:22

by Arnd Bergmann

[permalink] [raw]
Subject: Re: autofs: add autofs_parse_fd()

On Fri, Oct 20, 2023, at 09:48, Naresh Kamboju wrote:
> On Fri, 20 Oct 2023 at 12:07, Arnd Bergmann <[email protected]> wrote:
>>
>> On Thu, Oct 19, 2023, at 17:27, Naresh Kamboju wrote:
>> > The qemu-x86_64 and x86_64 booting with 64bit kernel and 32bit rootfs we call
>> > it as compat mode boot testing. Recently it started to failed to get login
>> > prompt.
>> >
>> > We have not seen any kernel crash logs.
>> >
>> > Anders, bisection is pointing to first bad commit,
>> > 546694b8f658 autofs: add autofs_parse_fd()
>> >
>> > Reported-by: Linux Kernel Functional Testing <[email protected]>
>> > Reported-by: Anders Roxell <[email protected]>
>>
>> I tried to find something in that commit that would be different
>> in compat mode, but don't see anything at all -- this appears
>> to be just a simple refactoring of the code, unlike the commits
>> that immediately follow it and that do change the mount
>> interface.
>>
>> Unfortunately this makes it impossible to just revert the commit
>> on top of linux-next. Can you double-check your bisection by
>> testing 546694b8f658 and the commit before it again?
>
> I will try your suggested ways.
>
> Is this information helpful ?
> Linux-next the regression started happening from next-20230925.
>
> GOOD: next-20230925
> BAD: next-20230926
>
> $ git log --oneline next-20230925..next-20230926 -- fs/autofs/
> dede367149c4 autofs: fix protocol sub version setting
> e6ec453bd0f0 autofs: convert autofs to use the new mount api
> 1f50012d9c63 autofs: validate protocol version
> 9b2731666d1d autofs: refactor parse_options()
> 7efd93ea790e autofs: reformat 0pt enum declaration
> a7467430b4de autofs: refactor super block info init
> 546694b8f658 autofs: add autofs_parse_fd()
> bc69fdde0ae1 autofs: refactor autofs_prepare_pipe()

Right, and it looks like the bottom five patches of this
should be fairly harmless as they only try to move code
around in preparation of the later changes, and even the
other ones should not cause any difference between a 32-bit
or a 64-bit /sbin/mount binary.

If the native (full 64-bit or full 32-bit) test run still
works with the same version, there may be some other difference
here.

>> What are the exact mount options you pass to autofs in your fstab?
>
> mount output shows like this,
> systemd-1 on /proc/sys/fs/binfmt_misc type autofs
> (rw,relatime,fd=30,pgrp=1,timeout=0,minproto=5,maxproto=5,direct,pipe_ino=1421)

This is only the binfmt-misc mount, which should not
prevent your rootfs from getting mounted, but it's possible
that failure to mount this prevents you from running
32-bit binaries.

I see this comes from the "proc-sys-fs-binfmt_misc.automount"
service in systemd. I see this is defined in
https://github.com/systemd/systemd/blob/main/units/proc-sys-fs-binfmt_misc.automount
but I don't know exactly what its purpose is here. On a
64-bit system, you normally use compat_binfmt_elf.ko to run
32-bit binaries, and this does not require any specific mount
points. Alternatively, you could use binfmt_misc.ko with
the procfs mount to configure running arbitrary binary
formats such as arm32 on x86_64 with qemu-user emulation.

I double-checked your rootfs image from
https://storage.tuxboot.com/debian/bookworm/i386/rootfs.ext4.xz
to ensure that this indeed contains i386 executables rather than
arm32 ones, and that is all fine.

I also see in your log file at
https://qa-reports.linaro.org/lkft/linux-next-master/build/next-20230926/testrun/20125035/suite/boot/test/gcc-13-lkftconfig-compat/log
that it is running the i386 binaries from the rootfs, but
it does get stuck soon after trying to set up the binfmt-misc
mount at the end of the log:

[[0;32m OK [0m] Reached target [0;1;39mlocal-fs.target[0m - Local File Systems.
Starting [0;1;39msystemd-binfmt.se…et Up Additional Binary Formats...
Starting [0;1;39msystemd-tmpfiles-… Volatile Files and Directories...
Starting [0;1;39msystemd-udevd.ser…ger for Device Events and Files...
[ 15.869404] igb 0000:01:00.0 eno1: renamed from eth0 (while UP)
[ 15.883753] igb 0000:02:00.0 eno2: renamed from eth1
[ 20.053885] (udev-worker) (175) used greatest stack depth: 12416 bytes left
quit

I'm a bit out of ideas at that point, my best guess now is
that your bisection points to something in autofs that makes
it hang while setting up autofs, but that neither autofs
nor binfmt-misc are actually being used otherwise.

Maybe you can try to modify your rootfs to disable or remove
the systemd-binfmt.service, to confirm that autofs is not
actually needed here but does cause the crash?

Arnd

2023-10-20 09:56:26

by Anders Roxell

[permalink] [raw]
Subject: Re: autofs: add autofs_parse_fd()

On Fri, 20 Oct 2023 at 08:37, Arnd Bergmann <[email protected]> wrote:
>
> On Thu, Oct 19, 2023, at 17:27, Naresh Kamboju wrote:
> > The qemu-x86_64 and x86_64 booting with 64bit kernel and 32bit rootfs we call
> > it as compat mode boot testing. Recently it started to failed to get login
> > prompt.
> >
> > We have not seen any kernel crash logs.
> >
> > Anders, bisection is pointing to first bad commit,
> > 546694b8f658 autofs: add autofs_parse_fd()
> >
> > Reported-by: Linux Kernel Functional Testing <[email protected]>
> > Reported-by: Anders Roxell <[email protected]>
>
> I tried to find something in that commit that would be different
> in compat mode, but don't see anything at all -- this appears
> to be just a simple refactoring of the code, unlike the commits
> that immediately follow it and that do change the mount
> interface.
>
> Unfortunately this makes it impossible to just revert the commit
> on top of linux-next. Can you double-check your bisection by
> testing 546694b8f658 and the commit before it again?

I tried these two patches again:
546694b8f658 ("autofs: add autofs_parse_fd()") - doesn't boot
bc69fdde0ae1 ("autofs: refactor autofs_prepare_pipe()") - boots

>
> What are the exact mount options you pass to autofs in your fstab?

cat /etc/fstab
# UNCONFIGURED FSTAB FOR BASE SYSTEM

Cheers,
Anders

2023-10-20 09:58:22

by Anders Roxell

[permalink] [raw]
Subject: Re: autofs: add autofs_parse_fd()

On Fri, 20 Oct 2023 at 11:02, Arnd Bergmann <[email protected]> wrote:
>
> On Fri, Oct 20, 2023, at 09:48, Naresh Kamboju wrote:
> > On Fri, 20 Oct 2023 at 12:07, Arnd Bergmann <[email protected]> wrote:
> >>
> >> On Thu, Oct 19, 2023, at 17:27, Naresh Kamboju wrote:
> >> > The qemu-x86_64 and x86_64 booting with 64bit kernel and 32bit rootfs we call
> >> > it as compat mode boot testing. Recently it started to failed to get login
> >> > prompt.
> >> >
> >> > We have not seen any kernel crash logs.
> >> >
> >> > Anders, bisection is pointing to first bad commit,
> >> > 546694b8f658 autofs: add autofs_parse_fd()
> >> >
> >> > Reported-by: Linux Kernel Functional Testing <[email protected]>
> >> > Reported-by: Anders Roxell <[email protected]>
> >>
> >> I tried to find something in that commit that would be different
> >> in compat mode, but don't see anything at all -- this appears
> >> to be just a simple refactoring of the code, unlike the commits
> >> that immediately follow it and that do change the mount
> >> interface.
> >>
> >> Unfortunately this makes it impossible to just revert the commit
> >> on top of linux-next. Can you double-check your bisection by
> >> testing 546694b8f658 and the commit before it again?
> >
> > I will try your suggested ways.
> >
> > Is this information helpful ?
> > Linux-next the regression started happening from next-20230925.
> >
> > GOOD: next-20230925
> > BAD: next-20230926
> >
> > $ git log --oneline next-20230925..next-20230926 -- fs/autofs/
> > dede367149c4 autofs: fix protocol sub version setting
> > e6ec453bd0f0 autofs: convert autofs to use the new mount api
> > 1f50012d9c63 autofs: validate protocol version
> > 9b2731666d1d autofs: refactor parse_options()
> > 7efd93ea790e autofs: reformat 0pt enum declaration
> > a7467430b4de autofs: refactor super block info init
> > 546694b8f658 autofs: add autofs_parse_fd()
> > bc69fdde0ae1 autofs: refactor autofs_prepare_pipe()
>
> Right, and it looks like the bottom five patches of this
> should be fairly harmless as they only try to move code
> around in preparation of the later changes, and even the
> other ones should not cause any difference between a 32-bit
> or a 64-bit /sbin/mount binary.
>
> If the native (full 64-bit or full 32-bit) test run still
> works with the same version, there may be some other difference
> here.
>
> >> What are the exact mount options you pass to autofs in your fstab?
> >
> > mount output shows like this,
> > systemd-1 on /proc/sys/fs/binfmt_misc type autofs
> > (rw,relatime,fd=30,pgrp=1,timeout=0,minproto=5,maxproto=5,direct,pipe_ino=1421)
>
> This is only the binfmt-misc mount, which should not
> prevent your rootfs from getting mounted, but it's possible
> that failure to mount this prevents you from running
> 32-bit binaries.
>
> I see this comes from the "proc-sys-fs-binfmt_misc.automount"
> service in systemd. I see this is defined in
> https://github.com/systemd/systemd/blob/main/units/proc-sys-fs-binfmt_misc.automount
> but I don't know exactly what its purpose is here. On a
> 64-bit system, you normally use compat_binfmt_elf.ko to run
> 32-bit binaries, and this does not require any specific mount
> points. Alternatively, you could use binfmt_misc.ko with
> the procfs mount to configure running arbitrary binary
> formats such as arm32 on x86_64 with qemu-user emulation.
>
> I double-checked your rootfs image from
> https://storage.tuxboot.com/debian/bookworm/i386/rootfs.ext4.xz
> to ensure that this indeed contains i386 executables rather than
> arm32 ones, and that is all fine.
>
> I also see in your log file at
> https://qa-reports.linaro.org/lkft/linux-next-master/build/next-20230926/testrun/20125035/suite/boot/test/gcc-13-lkftconfig-compat/log
> that it is running the i386 binaries from the rootfs, but
> it does get stuck soon after trying to set up the binfmt-misc
> mount at the end of the log:
>
> [[0;32m OK [0m] Reached target [0;1;39mlocal-fs.target[0m - Local File Systems.
> Starting [0;1;39msystemd-binfmt.se…et Up Additional Binary Formats...
> Starting [0;1;39msystemd-tmpfiles-… Volatile Files and Directories...
> Starting [0;1;39msystemd-udevd.ser…ger for Device Events and Files...
> [ 15.869404] igb 0000:01:00.0 eno1: renamed from eth0 (while UP)
> [ 15.883753] igb 0000:02:00.0 eno2: renamed from eth1
> [ 20.053885] (udev-worker) (175) used greatest stack depth: 12416 bytes left
> quit
>
> I'm a bit out of ideas at that point, my best guess now is
> that your bisection points to something in autofs that makes
> it hang while setting up autofs, but that neither autofs
> nor binfmt-misc are actually being used otherwise.
>
> Maybe you can try to modify your rootfs to disable or remove
> the systemd-binfmt.service, to confirm that autofs is not
> actually needed here but does cause the crash?

I removed systemd-binfmt.service from the rootfs and booted
546694b8f658 ("autofs: add autofs_parse_fd()") and now it booted fine.

Cheers,
Anders

2023-10-20 10:47:26

by Dan Carpenter

[permalink] [raw]
Subject: Re: autofs: add autofs_parse_fd()

On Fri, Oct 20, 2023 at 11:55:57AM +0200, Anders Roxell wrote:
> On Fri, 20 Oct 2023 at 08:37, Arnd Bergmann <[email protected]> wrote:
> >
> > On Thu, Oct 19, 2023, at 17:27, Naresh Kamboju wrote:
> > > The qemu-x86_64 and x86_64 booting with 64bit kernel and 32bit rootfs we call
> > > it as compat mode boot testing. Recently it started to failed to get login
> > > prompt.
> > >
> > > We have not seen any kernel crash logs.
> > >
> > > Anders, bisection is pointing to first bad commit,
> > > 546694b8f658 autofs: add autofs_parse_fd()
> > >
> > > Reported-by: Linux Kernel Functional Testing <[email protected]>
> > > Reported-by: Anders Roxell <[email protected]>
> >
> > I tried to find something in that commit that would be different
> > in compat mode, but don't see anything at all -- this appears
> > to be just a simple refactoring of the code, unlike the commits
> > that immediately follow it and that do change the mount
> > interface.
> >
> > Unfortunately this makes it impossible to just revert the commit
> > on top of linux-next. Can you double-check your bisection by
> > testing 546694b8f658 and the commit before it again?
>
> I tried these two patches again:
> 546694b8f658 ("autofs: add autofs_parse_fd()") - doesn't boot
> bc69fdde0ae1 ("autofs: refactor autofs_prepare_pipe()") - boots
>

One difference that I notice between those two patches is that we no
long call autofs_prepare_pipe(). We just call autofs_check_pipe().

regards,
dan carpenter

2023-10-20 11:24:03

by Arnd Bergmann

[permalink] [raw]
Subject: Re: autofs: add autofs_parse_fd()

On Fri, Oct 20, 2023, at 12:45, Dan Carpenter wrote:
> On Fri, Oct 20, 2023 at 11:55:57AM +0200, Anders Roxell wrote:
>> On Fri, 20 Oct 2023 at 08:37, Arnd Bergmann <[email protected]> wrote:
>> >
>> > On Thu, Oct 19, 2023, at 17:27, Naresh Kamboju wrote:
>> > > The qemu-x86_64 and x86_64 booting with 64bit kernel and 32bit rootfs we call
>> > > it as compat mode boot testing. Recently it started to failed to get login
>> > > prompt.
>> > >
>> > > We have not seen any kernel crash logs.
>> > >
>> > > Anders, bisection is pointing to first bad commit,
>> > > 546694b8f658 autofs: add autofs_parse_fd()
>> > >
>> > > Reported-by: Linux Kernel Functional Testing <[email protected]>
>> > > Reported-by: Anders Roxell <[email protected]>
>> >
>> > I tried to find something in that commit that would be different
>> > in compat mode, but don't see anything at all -- this appears
>> > to be just a simple refactoring of the code, unlike the commits
>> > that immediately follow it and that do change the mount
>> > interface.
>> >
>> > Unfortunately this makes it impossible to just revert the commit
>> > on top of linux-next. Can you double-check your bisection by
>> > testing 546694b8f658 and the commit before it again?
>>
>> I tried these two patches again:
>> 546694b8f658 ("autofs: add autofs_parse_fd()") - doesn't boot
>> bc69fdde0ae1 ("autofs: refactor autofs_prepare_pipe()") - boots
>>
>
> One difference that I notice between those two patches is that we no
> long call autofs_prepare_pipe(). We just call autofs_check_pipe().

Indeed, so some of the f_flags end up being different. I assumed
this was done intentionally, but it might be worth checking if
the patch below makes any difference when the flags get put
back the way they were. This is probably not the correct fix, but
may help figure out what is going on. It should apply to anything
from 546694b8f658 ("autofs: add autofs_parse_fd()") to the current
linux-next:

--- a/fs/autofs/inode.c
+++ b/fs/autofs/inode.c
@@ -358,6 +358,11 @@ static int autofs_fill_super(struct super_block *s, struct fs_context *fc)
pr_debug("pipe fd = %d, pgrp = %u\n",
sbi->pipefd, pid_nr(sbi->oz_pgrp));

+ /* We want a packet pipe */
+ sbi->pipe->f_flags |= O_DIRECT;
+ /* We don't expect -EAGAIN */
+ sbi->pipe->f_flags &= ~O_NONBLOCK;
+
sbi->flags &= ~AUTOFS_SBI_CATATONIC;

/*

2023-10-20 12:38:09

by Ian Kent

[permalink] [raw]
Subject: Re: autofs: add autofs_parse_fd()

On 20/10/23 14:36, Arnd Bergmann wrote:
> On Thu, Oct 19, 2023, at 17:27, Naresh Kamboju wrote:
>> The qemu-x86_64 and x86_64 booting with 64bit kernel and 32bit rootfs we call
>> it as compat mode boot testing. Recently it started to failed to get login
>> prompt.
>>
>> We have not seen any kernel crash logs.
>>
>> Anders, bisection is pointing to first bad commit,
>> 546694b8f658 autofs: add autofs_parse_fd()
>>
>> Reported-by: Linux Kernel Functional Testing <[email protected]>
>> Reported-by: Anders Roxell <[email protected]>
> I tried to find something in that commit that would be different
> in compat mode, but don't see anything at all -- this appears
> to be just a simple refactoring of the code, unlike the commits
> that immediately follow it and that do change the mount
> interface.
>
> Unfortunately this makes it impossible to just revert the commit
> on top of linux-next. Can you double-check your bisection by
> testing 546694b8f658 and the commit before it again?
>
> What are the exact mount options you pass to autofs in your fstab?

I've only just seen this report, sorry about that.


I'll have a go at trying to reproduce it but debugging in an environment

like this will be something of a challenge I think.


Ian

2023-10-20 12:56:05

by Ian Kent

[permalink] [raw]
Subject: Re: autofs: add autofs_parse_fd()


On 20/10/23 17:02, Arnd Bergmann wrote:
> On Fri, Oct 20, 2023, at 09:48, Naresh Kamboju wrote:
>> On Fri, 20 Oct 2023 at 12:07, Arnd Bergmann <[email protected]> wrote:
>>> On Thu, Oct 19, 2023, at 17:27, Naresh Kamboju wrote:
>>>> The qemu-x86_64 and x86_64 booting with 64bit kernel and 32bit rootfs we call
>>>> it as compat mode boot testing. Recently it started to failed to get login
>>>> prompt.
>>>>
>>>> We have not seen any kernel crash logs.
>>>>
>>>> Anders, bisection is pointing to first bad commit,
>>>> 546694b8f658 autofs: add autofs_parse_fd()
>>>>
>>>> Reported-by: Linux Kernel Functional Testing <[email protected]>
>>>> Reported-by: Anders Roxell <[email protected]>
>>> I tried to find something in that commit that would be different
>>> in compat mode, but don't see anything at all -- this appears
>>> to be just a simple refactoring of the code, unlike the commits
>>> that immediately follow it and that do change the mount
>>> interface.
>>>
>>> Unfortunately this makes it impossible to just revert the commit
>>> on top of linux-next. Can you double-check your bisection by
>>> testing 546694b8f658 and the commit before it again?
>> I will try your suggested ways.
>>
>> Is this information helpful ?
>> Linux-next the regression started happening from next-20230925.
>>
>> GOOD: next-20230925
>> BAD: next-20230926
>>
>> $ git log --oneline next-20230925..next-20230926 -- fs/autofs/
>> dede367149c4 autofs: fix protocol sub version setting
>> e6ec453bd0f0 autofs: convert autofs to use the new mount api
>> 1f50012d9c63 autofs: validate protocol version
>> 9b2731666d1d autofs: refactor parse_options()
>> 7efd93ea790e autofs: reformat 0pt enum declaration
>> a7467430b4de autofs: refactor super block info init
>> 546694b8f658 autofs: add autofs_parse_fd()
>> bc69fdde0ae1 autofs: refactor autofs_prepare_pipe()
> Right, and it looks like the bottom five patches of this
> should be fairly harmless as they only try to move code
> around in preparation of the later changes, and even the
> other ones should not cause any difference between a 32-bit
> or a 64-bit /sbin/mount binary.
>
> If the native (full 64-bit or full 32-bit) test run still
> works with the same version, there may be some other difference
> here.
>
>>> What are the exact mount options you pass to autofs in your fstab?
>> mount output shows like this,
>> systemd-1 on /proc/sys/fs/binfmt_misc type autofs
>> (rw,relatime,fd=30,pgrp=1,timeout=0,minproto=5,maxproto=5,direct,pipe_ino=1421)
> This is only the binfmt-misc mount, which should not
> prevent your rootfs from getting mounted, but it's possible
> that failure to mount this prevents you from running
> 32-bit binaries.
>
> I see this comes from the "proc-sys-fs-binfmt_misc.automount"
> service in systemd. I see this is defined in
> https://github.com/systemd/systemd/blob/main/units/proc-sys-fs-binfmt_misc.automount
> but I don't know exactly what its purpose is here. On a
> 64-bit system, you normally use compat_binfmt_elf.ko to run
> 32-bit binaries, and this does not require any specific mount
> points. Alternatively, you could use binfmt_misc.ko with
> the procfs mount to configure running arbitrary binary
> formats such as arm32 on x86_64 with qemu-user emulation.
>
> I double-checked your rootfs image from
> https://storage.tuxboot.com/debian/bookworm/i386/rootfs.ext4.xz
> to ensure that this indeed contains i386 executables rather than
> arm32 ones, and that is all fine.
>
> I also see in your log file at
> https://qa-reports.linaro.org/lkft/linux-next-master/build/next-20230926/testrun/20125035/suite/boot/test/gcc-13-lkftconfig-compat/log
> that it is running the i386 binaries from the rootfs, but
> it does get stuck soon after trying to set up the binfmt-misc
> mount at the end of the log:
>
> [[0;32m OK [0m] Reached target [0;1;39mlocal-fs.target[0m - Local File Systems.
> Starting [0;1;39msystemd-binfmt.se…et Up Additional Binary Formats...
> Starting [0;1;39msystemd-tmpfiles-… Volatile Files and Directories...
> Starting [0;1;39msystemd-udevd.ser…ger for Device Events and Files...
> [ 15.869404] igb 0000:01:00.0 eno1: renamed from eth0 (while UP)
> [ 15.883753] igb 0000:02:00.0 eno2: renamed from eth1
> [ 20.053885] (udev-worker) (175) used greatest stack depth: 12416 bytes left
> quit

Were there any console log messages at the time the problem occurred?


>
> I'm a bit out of ideas at that point, my best guess now is
> that your bisection points to something in autofs that makes
> it hang while setting up autofs, but that neither autofs
> nor binfmt-misc are actually being used otherwise.
>
> Maybe you can try to modify your rootfs to disable or remove
> the systemd-binfmt.service, to confirm that autofs is not
> actually needed here but does cause the crash?
>
> Arnd

2023-10-20 13:01:53

by Ian Kent

[permalink] [raw]
Subject: Re: autofs: add autofs_parse_fd()

On 20/10/23 17:57, Anders Roxell wrote:
> On Fri, 20 Oct 2023 at 11:02, Arnd Bergmann <[email protected]> wrote:
>> On Fri, Oct 20, 2023, at 09:48, Naresh Kamboju wrote:
>>> On Fri, 20 Oct 2023 at 12:07, Arnd Bergmann <[email protected]> wrote:
>>>> On Thu, Oct 19, 2023, at 17:27, Naresh Kamboju wrote:
>>>>> The qemu-x86_64 and x86_64 booting with 64bit kernel and 32bit rootfs we call
>>>>> it as compat mode boot testing. Recently it started to failed to get login
>>>>> prompt.
>>>>>
>>>>> We have not seen any kernel crash logs.
>>>>>
>>>>> Anders, bisection is pointing to first bad commit,
>>>>> 546694b8f658 autofs: add autofs_parse_fd()
>>>>>
>>>>> Reported-by: Linux Kernel Functional Testing <[email protected]>
>>>>> Reported-by: Anders Roxell <[email protected]>
>>>> I tried to find something in that commit that would be different
>>>> in compat mode, but don't see anything at all -- this appears
>>>> to be just a simple refactoring of the code, unlike the commits
>>>> that immediately follow it and that do change the mount
>>>> interface.
>>>>
>>>> Unfortunately this makes it impossible to just revert the commit
>>>> on top of linux-next. Can you double-check your bisection by
>>>> testing 546694b8f658 and the commit before it again?
>>> I will try your suggested ways.
>>>
>>> Is this information helpful ?
>>> Linux-next the regression started happening from next-20230925.
>>>
>>> GOOD: next-20230925
>>> BAD: next-20230926
>>>
>>> $ git log --oneline next-20230925..next-20230926 -- fs/autofs/
>>> dede367149c4 autofs: fix protocol sub version setting
>>> e6ec453bd0f0 autofs: convert autofs to use the new mount api
>>> 1f50012d9c63 autofs: validate protocol version
>>> 9b2731666d1d autofs: refactor parse_options()
>>> 7efd93ea790e autofs: reformat 0pt enum declaration
>>> a7467430b4de autofs: refactor super block info init
>>> 546694b8f658 autofs: add autofs_parse_fd()
>>> bc69fdde0ae1 autofs: refactor autofs_prepare_pipe()
>> Right, and it looks like the bottom five patches of this
>> should be fairly harmless as they only try to move code
>> around in preparation of the later changes, and even the
>> other ones should not cause any difference between a 32-bit
>> or a 64-bit /sbin/mount binary.
>>
>> If the native (full 64-bit or full 32-bit) test run still
>> works with the same version, there may be some other difference
>> here.
>>
>>>> What are the exact mount options you pass to autofs in your fstab?
>>> mount output shows like this,
>>> systemd-1 on /proc/sys/fs/binfmt_misc type autofs
>>> (rw,relatime,fd=30,pgrp=1,timeout=0,minproto=5,maxproto=5,direct,pipe_ino=1421)
>> This is only the binfmt-misc mount, which should not
>> prevent your rootfs from getting mounted, but it's possible
>> that failure to mount this prevents you from running
>> 32-bit binaries.
>>
>> I see this comes from the "proc-sys-fs-binfmt_misc.automount"
>> service in systemd. I see this is defined in
>> https://github.com/systemd/systemd/blob/main/units/proc-sys-fs-binfmt_misc.automount
>> but I don't know exactly what its purpose is here. On a
>> 64-bit system, you normally use compat_binfmt_elf.ko to run
>> 32-bit binaries, and this does not require any specific mount
>> points. Alternatively, you could use binfmt_misc.ko with
>> the procfs mount to configure running arbitrary binary
>> formats such as arm32 on x86_64 with qemu-user emulation.
>>
>> I double-checked your rootfs image from
>> https://storage.tuxboot.com/debian/bookworm/i386/rootfs.ext4.xz
>> to ensure that this indeed contains i386 executables rather than
>> arm32 ones, and that is all fine.
>>
>> I also see in your log file at
>> https://qa-reports.linaro.org/lkft/linux-next-master/build/next-20230926/testrun/20125035/suite/boot/test/gcc-13-lkftconfig-compat/log
>> that it is running the i386 binaries from the rootfs, but
>> it does get stuck soon after trying to set up the binfmt-misc
>> mount at the end of the log:
>>
>> [[0;32m OK [0m] Reached target [0;1;39mlocal-fs.target[0m - Local File Systems.
>> Starting [0;1;39msystemd-binfmt.se…et Up Additional Binary Formats...
>> Starting [0;1;39msystemd-tmpfiles-… Volatile Files and Directories...
>> Starting [0;1;39msystemd-udevd.ser…ger for Device Events and Files...
>> [ 15.869404] igb 0000:01:00.0 eno1: renamed from eth0 (while UP)
>> [ 15.883753] igb 0000:02:00.0 eno2: renamed from eth1
>> [ 20.053885] (udev-worker) (175) used greatest stack depth: 12416 bytes left
>> quit
>>
>> I'm a bit out of ideas at that point, my best guess now is
>> that your bisection points to something in autofs that makes
>> it hang while setting up autofs, but that neither autofs
>> nor binfmt-misc are actually being used otherwise.
>>
>> Maybe you can try to modify your rootfs to disable or remove
>> the systemd-binfmt.service, to confirm that autofs is not
>> actually needed here but does cause the crash?
> I removed systemd-binfmt.service from the rootfs and booted
> 546694b8f658 ("autofs: add autofs_parse_fd()") and now it booted fine.

I don't suppose you could try an automount after the boot is completed?


It seems a bit odd, it must be some sort of object lifetime inconsistency

but if that was the case automounts would at least fail to function mmm ...


Ian


2023-10-20 13:09:56

by Ian Kent

[permalink] [raw]
Subject: Re: autofs: add autofs_parse_fd()

On 20/10/23 19:23, Arnd Bergmann wrote:
> On Fri, Oct 20, 2023, at 12:45, Dan Carpenter wrote:
>> On Fri, Oct 20, 2023 at 11:55:57AM +0200, Anders Roxell wrote:
>>> On Fri, 20 Oct 2023 at 08:37, Arnd Bergmann <[email protected]> wrote:
>>>> On Thu, Oct 19, 2023, at 17:27, Naresh Kamboju wrote:
>>>>> The qemu-x86_64 and x86_64 booting with 64bit kernel and 32bit rootfs we call
>>>>> it as compat mode boot testing. Recently it started to failed to get login
>>>>> prompt.
>>>>>
>>>>> We have not seen any kernel crash logs.
>>>>>
>>>>> Anders, bisection is pointing to first bad commit,
>>>>> 546694b8f658 autofs: add autofs_parse_fd()
>>>>>
>>>>> Reported-by: Linux Kernel Functional Testing <[email protected]>
>>>>> Reported-by: Anders Roxell <[email protected]>
>>>> I tried to find something in that commit that would be different
>>>> in compat mode, but don't see anything at all -- this appears
>>>> to be just a simple refactoring of the code, unlike the commits
>>>> that immediately follow it and that do change the mount
>>>> interface.
>>>>
>>>> Unfortunately this makes it impossible to just revert the commit
>>>> on top of linux-next. Can you double-check your bisection by
>>>> testing 546694b8f658 and the commit before it again?
>>> I tried these two patches again:
>>> 546694b8f658 ("autofs: add autofs_parse_fd()") - doesn't boot
>>> bc69fdde0ae1 ("autofs: refactor autofs_prepare_pipe()") - boots
>>>
>> One difference that I notice between those two patches is that we no
>> long call autofs_prepare_pipe(). We just call autofs_check_pipe().
> Indeed, so some of the f_flags end up being different. I assumed
> this was done intentionally, but it might be worth checking if
> the patch below makes any difference when the flags get put
> back the way they were. This is probably not the correct fix, but
> may help figure out what is going on. It should apply to anything
> from 546694b8f658 ("autofs: add autofs_parse_fd()") to the current
> linux-next:
>
> --- a/fs/autofs/inode.c
> +++ b/fs/autofs/inode.c
> @@ -358,6 +358,11 @@ static int autofs_fill_super(struct super_block *s, struct fs_context *fc)
> pr_debug("pipe fd = %d, pgrp = %u\n",
> sbi->pipefd, pid_nr(sbi->oz_pgrp));
>
> + /* We want a packet pipe */
> + sbi->pipe->f_flags |= O_DIRECT;
> + /* We don't expect -EAGAIN */
> + sbi->pipe->f_flags &= ~O_NONBLOCK;
> +


That makes sense, we do want a packet pipe and that does also mean

we don't want a non-blocking pipe, it will be interesting to see

if that makes a difference. It's been a long time since Linus

implemented that packet pipe and I can't remember now what the

case was that lead to it.


Ian

> sbi->flags &= ~AUTOFS_SBI_CATATONIC;
>
> /*

2023-10-23 00:49:16

by Ian Kent

[permalink] [raw]
Subject: Re: autofs: add autofs_parse_fd()

On 20/10/23 21:09, Ian Kent wrote:
> On 20/10/23 19:23, Arnd Bergmann wrote:
>> On Fri, Oct 20, 2023, at 12:45, Dan Carpenter wrote:
>>> On Fri, Oct 20, 2023 at 11:55:57AM +0200, Anders Roxell wrote:
>>>> On Fri, 20 Oct 2023 at 08:37, Arnd Bergmann <[email protected]> wrote:
>>>>> On Thu, Oct 19, 2023, at 17:27, Naresh Kamboju wrote:
>>>>>> The qemu-x86_64 and x86_64 booting with 64bit kernel and 32bit
>>>>>> rootfs we call
>>>>>> it as compat mode boot testing. Recently it started to failed to
>>>>>> get login
>>>>>> prompt.
>>>>>>
>>>>>> We have not seen any kernel crash logs.
>>>>>>
>>>>>> Anders, bisection is pointing to first bad commit,
>>>>>> 546694b8f658 autofs: add autofs_parse_fd()
>>>>>>
>>>>>> Reported-by: Linux Kernel Functional Testing <[email protected]>
>>>>>> Reported-by: Anders Roxell <[email protected]>
>>>>> I tried to find something in that commit that would be different
>>>>> in compat mode, but don't see anything at all -- this appears
>>>>> to be just a simple refactoring of the code, unlike the commits
>>>>> that immediately follow it and that do change the mount
>>>>> interface.
>>>>>
>>>>> Unfortunately this makes it impossible to just revert the commit
>>>>> on top of linux-next. Can you double-check your bisection by
>>>>> testing 546694b8f658 and the commit before it again?
>>>> I tried these two patches again:
>>>> 546694b8f658 ("autofs: add autofs_parse_fd()") - doesn't boot
>>>> bc69fdde0ae1 ("autofs: refactor autofs_prepare_pipe()") - boots
>>>>
>>> One difference that I notice between those two patches is that we no
>>> long call autofs_prepare_pipe().  We just call autofs_check_pipe().
>> Indeed, so some of the f_flags end up being different. I assumed
>> this was done intentionally, but it might be worth checking if
>> the patch below makes any difference when the flags get put
>> back the way they were. This is probably not the correct fix, but
>> may help figure out what is going on. It should apply to anything
>> from 546694b8f658 ("autofs: add autofs_parse_fd()") to the current
>> linux-next:
>>
>> --- a/fs/autofs/inode.c
>> +++ b/fs/autofs/inode.c
>> @@ -358,6 +358,11 @@ static int autofs_fill_super(struct super_block
>> *s, struct fs_context *fc)
>>          pr_debug("pipe fd = %d, pgrp = %u\n",
>>                   sbi->pipefd, pid_nr(sbi->oz_pgrp));
>>   +        /* We want a packet pipe */
>> +        sbi->pipe->f_flags |= O_DIRECT;
>> +        /* We don't expect -EAGAIN */
>> +        sbi->pipe->f_flags &= ~O_NONBLOCK;
>> +
>
>
> That makes sense, we do want a packet pipe and that does also mean
>
> we don't want a non-blocking pipe, it will be interesting to see
>
> if that makes a difference. It's been a long time since Linus
>
> implemented that packet pipe and I can't remember now what the
>
> case was that lead to it.

After thinking about this over the weekend I'm pretty sure my mistake

is dropping the call to autofs_prepare_pipe() without adding the tail

end of it into autofs_parse_fd().


To explain a bit of history which I'll include in the fix description.

During autofs v5 development I decided to stay with the existing usage

instead of changing to a packed structure for autofs <=> user space

communications which turned out to be a mistake on my part.


Problems arose and they were fixed by allowing for the 64 bit to 32 bit

size difference in the automount(8) code.


Along the way systemd started to use autofs and eventually encountered

this problem too. systemd refused to compensate for the length difference

insisting it be fixed in the kernel. Fortunately Linus implemented the

packetized pipe which resolved the problem in a straight forward and

simple way.


So I pretty sure that the cause of the problem is the inadvertent dropping

of the flags setting in autofs_fill_super() that Arnd spotted although I

don't think putting it in autofs_fill_super() is the right thing to do.


I'll produce a patch today which includes most of this explanation for

future travelers ...


>
>
> Ian
>
>>          sbi->flags &= ~AUTOFS_SBI_CATATONIC;
>>            /*

2023-10-23 07:36:00

by Ian Kent

[permalink] [raw]
Subject: Re: autofs: add autofs_parse_fd()

On 23/10/23 08:48, Ian Kent wrote:
> On 20/10/23 21:09, Ian Kent wrote:
>> On 20/10/23 19:23, Arnd Bergmann wrote:
>>> On Fri, Oct 20, 2023, at 12:45, Dan Carpenter wrote:
>>>> On Fri, Oct 20, 2023 at 11:55:57AM +0200, Anders Roxell wrote:
>>>>> On Fri, 20 Oct 2023 at 08:37, Arnd Bergmann <[email protected]> wrote:
>>>>>> On Thu, Oct 19, 2023, at 17:27, Naresh Kamboju wrote:
>>>>>>> The qemu-x86_64 and x86_64 booting with 64bit kernel and 32bit
>>>>>>> rootfs we call
>>>>>>> it as compat mode boot testing. Recently it started to failed to
>>>>>>> get login
>>>>>>> prompt.
>>>>>>>
>>>>>>> We have not seen any kernel crash logs.
>>>>>>>
>>>>>>> Anders, bisection is pointing to first bad commit,
>>>>>>> 546694b8f658 autofs: add autofs_parse_fd()
>>>>>>>
>>>>>>> Reported-by: Linux Kernel Functional Testing <[email protected]>
>>>>>>> Reported-by: Anders Roxell <[email protected]>
>>>>>> I tried to find something in that commit that would be different
>>>>>> in compat mode, but don't see anything at all -- this appears
>>>>>> to be just a simple refactoring of the code, unlike the commits
>>>>>> that immediately follow it and that do change the mount
>>>>>> interface.
>>>>>>
>>>>>> Unfortunately this makes it impossible to just revert the commit
>>>>>> on top of linux-next. Can you double-check your bisection by
>>>>>> testing 546694b8f658 and the commit before it again?
>>>>> I tried these two patches again:
>>>>> 546694b8f658 ("autofs: add autofs_parse_fd()") - doesn't boot
>>>>> bc69fdde0ae1 ("autofs: refactor autofs_prepare_pipe()") - boots
>>>>>
>>>> One difference that I notice between those two patches is that we no
>>>> long call autofs_prepare_pipe().  We just call autofs_check_pipe().
>>> Indeed, so some of the f_flags end up being different. I assumed
>>> this was done intentionally, but it might be worth checking if
>>> the patch below makes any difference when the flags get put
>>> back the way they were. This is probably not the correct fix, but
>>> may help figure out what is going on. It should apply to anything
>>> from 546694b8f658 ("autofs: add autofs_parse_fd()") to the current
>>> linux-next:
>>>
>>> --- a/fs/autofs/inode.c
>>> +++ b/fs/autofs/inode.c
>>> @@ -358,6 +358,11 @@ static int autofs_fill_super(struct super_block
>>> *s, struct fs_context *fc)
>>>          pr_debug("pipe fd = %d, pgrp = %u\n",
>>>                   sbi->pipefd, pid_nr(sbi->oz_pgrp));
>>>   +        /* We want a packet pipe */
>>> +        sbi->pipe->f_flags |= O_DIRECT;
>>> +        /* We don't expect -EAGAIN */
>>> +        sbi->pipe->f_flags &= ~O_NONBLOCK;
>>> +
>>
>>
>> That makes sense, we do want a packet pipe and that does also mean
>>
>> we don't want a non-blocking pipe, it will be interesting to see
>>
>> if that makes a difference. It's been a long time since Linus
>>
>> implemented that packet pipe and I can't remember now what the
>>
>> case was that lead to it.
>
> After thinking about this over the weekend I'm pretty sure my mistake
>
> is dropping the call to autofs_prepare_pipe() without adding the tail
>
> end of it into autofs_parse_fd().
>
>
> To explain a bit of history which I'll include in the fix description.
>
> During autofs v5 development I decided to stay with the existing usage
>
> instead of changing to a packed structure for autofs <=> user space
>
> communications which turned out to be a mistake on my part.
>
>
> Problems arose and they were fixed by allowing for the 64 bit to 32 bit
>
> size difference in the automount(8) code.
>
>
> Along the way systemd started to use autofs and eventually encountered
>
> this problem too. systemd refused to compensate for the length difference
>
> insisting it be fixed in the kernel. Fortunately Linus implemented the
>
> packetized pipe which resolved the problem in a straight forward and
>
> simple way.
>
>
> So I pretty sure that the cause of the problem is the inadvertent
> dropping
>
> of the flags setting in autofs_fill_super() that Arnd spotted although I
>
> don't think putting it in autofs_fill_super() is the right thing to do.
>
>
> I'll produce a patch today which includes most of this explanation for
>
> future travelers ...

So I have a patch.


I'm of two minds whether to try and use the instructions to reproduce this

or not because of experiences I have had with other similar testing
automation

systems that claim to provide a reproducer and end up a huge waste of
time and

are significantly frustrating.


Can someone please perform a test for me once I provide the patch?


Ian

>
>
>>
>>
>> Ian
>>
>>>          sbi->flags &= ~AUTOFS_SBI_CATATONIC;
>>>            /*

2023-10-23 13:58:18

by Anders Roxell

[permalink] [raw]
Subject: Re: autofs: add autofs_parse_fd()

On Mon, 23 Oct 2023 at 09:35, Ian Kent <[email protected]> wrote:
>
> On 23/10/23 08:48, Ian Kent wrote:
> > On 20/10/23 21:09, Ian Kent wrote:
> >> On 20/10/23 19:23, Arnd Bergmann wrote:
> >>> On Fri, Oct 20, 2023, at 12:45, Dan Carpenter wrote:
> >>>> On Fri, Oct 20, 2023 at 11:55:57AM +0200, Anders Roxell wrote:
> >>>>> On Fri, 20 Oct 2023 at 08:37, Arnd Bergmann <[email protected]> wrote:
> >>>>>> On Thu, Oct 19, 2023, at 17:27, Naresh Kamboju wrote:
> >>>>>>> The qemu-x86_64 and x86_64 booting with 64bit kernel and 32bit
> >>>>>>> rootfs we call
> >>>>>>> it as compat mode boot testing. Recently it started to failed to
> >>>>>>> get login
> >>>>>>> prompt.
> >>>>>>>
> >>>>>>> We have not seen any kernel crash logs.
> >>>>>>>
> >>>>>>> Anders, bisection is pointing to first bad commit,
> >>>>>>> 546694b8f658 autofs: add autofs_parse_fd()
> >>>>>>>
> >>>>>>> Reported-by: Linux Kernel Functional Testing <[email protected]>
> >>>>>>> Reported-by: Anders Roxell <[email protected]>
> >>>>>> I tried to find something in that commit that would be different
> >>>>>> in compat mode, but don't see anything at all -- this appears
> >>>>>> to be just a simple refactoring of the code, unlike the commits
> >>>>>> that immediately follow it and that do change the mount
> >>>>>> interface.
> >>>>>>
> >>>>>> Unfortunately this makes it impossible to just revert the commit
> >>>>>> on top of linux-next. Can you double-check your bisection by
> >>>>>> testing 546694b8f658 and the commit before it again?
> >>>>> I tried these two patches again:
> >>>>> 546694b8f658 ("autofs: add autofs_parse_fd()") - doesn't boot
> >>>>> bc69fdde0ae1 ("autofs: refactor autofs_prepare_pipe()") - boots
> >>>>>
> >>>> One difference that I notice between those two patches is that we no
> >>>> long call autofs_prepare_pipe(). We just call autofs_check_pipe().
> >>> Indeed, so some of the f_flags end up being different. I assumed
> >>> this was done intentionally, but it might be worth checking if
> >>> the patch below makes any difference when the flags get put
> >>> back the way they were. This is probably not the correct fix, but
> >>> may help figure out what is going on. It should apply to anything
> >>> from 546694b8f658 ("autofs: add autofs_parse_fd()") to the current
> >>> linux-next:
> >>>
> >>> --- a/fs/autofs/inode.c
> >>> +++ b/fs/autofs/inode.c
> >>> @@ -358,6 +358,11 @@ static int autofs_fill_super(struct super_block
> >>> *s, struct fs_context *fc)
> >>> pr_debug("pipe fd = %d, pgrp = %u\n",
> >>> sbi->pipefd, pid_nr(sbi->oz_pgrp));
> >>> + /* We want a packet pipe */
> >>> + sbi->pipe->f_flags |= O_DIRECT;
> >>> + /* We don't expect -EAGAIN */
> >>> + sbi->pipe->f_flags &= ~O_NONBLOCK;
> >>> +
> >>
> >>
> >> That makes sense, we do want a packet pipe and that does also mean
> >>
> >> we don't want a non-blocking pipe, it will be interesting to see
> >>
> >> if that makes a difference. It's been a long time since Linus
> >>
> >> implemented that packet pipe and I can't remember now what the
> >>
> >> case was that lead to it.
> >
> > After thinking about this over the weekend I'm pretty sure my mistake
> >
> > is dropping the call to autofs_prepare_pipe() without adding the tail
> >
> > end of it into autofs_parse_fd().
> >
> >
> > To explain a bit of history which I'll include in the fix description.
> >
> > During autofs v5 development I decided to stay with the existing usage
> >
> > instead of changing to a packed structure for autofs <=> user space
> >
> > communications which turned out to be a mistake on my part.
> >
> >
> > Problems arose and they were fixed by allowing for the 64 bit to 32 bit
> >
> > size difference in the automount(8) code.
> >
> >
> > Along the way systemd started to use autofs and eventually encountered
> >
> > this problem too. systemd refused to compensate for the length difference
> >
> > insisting it be fixed in the kernel. Fortunately Linus implemented the
> >
> > packetized pipe which resolved the problem in a straight forward and
> >
> > simple way.
> >
> >
> > So I pretty sure that the cause of the problem is the inadvertent
> > dropping
> >
> > of the flags setting in autofs_fill_super() that Arnd spotted although I
> >
> > don't think putting it in autofs_fill_super() is the right thing to do.
> >
> >
> > I'll produce a patch today which includes most of this explanation for
> >
> > future travelers ...
>
> So I have a patch.
>
>
> I'm of two minds whether to try and use the instructions to reproduce this
>
> or not because of experiences I have had with other similar testing
> automation
>
> systems that claim to provide a reproducer and end up a huge waste of
> time and
>
> are significantly frustrating.
>
>
> Can someone please perform a test for me once I provide the patch?

Just tested it, and it passed our tests. Added a tested by flag in your patch.

Thanks for the prompt fix.

Cheers,
Anders





>
>
> Ian
>
> >
> >
> >>
> >>
> >> Ian
> >>
> >>> sbi->flags &= ~AUTOFS_SBI_CATATONIC;
> >>> /*

2023-10-23 22:25:50

by Ian Kent

[permalink] [raw]
Subject: Re: autofs: add autofs_parse_fd()

On 23/10/23 21:57, Anders Roxell wrote:
> On Mon, 23 Oct 2023 at 09:35, Ian Kent <[email protected]> wrote:
>> On 23/10/23 08:48, Ian Kent wrote:
>>> On 20/10/23 21:09, Ian Kent wrote:
>>>> On 20/10/23 19:23, Arnd Bergmann wrote:
>>>>> On Fri, Oct 20, 2023, at 12:45, Dan Carpenter wrote:
>>>>>> On Fri, Oct 20, 2023 at 11:55:57AM +0200, Anders Roxell wrote:
>>>>>>> On Fri, 20 Oct 2023 at 08:37, Arnd Bergmann <[email protected]> wrote:
>>>>>>>> On Thu, Oct 19, 2023, at 17:27, Naresh Kamboju wrote:
>>>>>>>>> The qemu-x86_64 and x86_64 booting with 64bit kernel and 32bit
>>>>>>>>> rootfs we call
>>>>>>>>> it as compat mode boot testing. Recently it started to failed to
>>>>>>>>> get login
>>>>>>>>> prompt.
>>>>>>>>>
>>>>>>>>> We have not seen any kernel crash logs.
>>>>>>>>>
>>>>>>>>> Anders, bisection is pointing to first bad commit,
>>>>>>>>> 546694b8f658 autofs: add autofs_parse_fd()
>>>>>>>>>
>>>>>>>>> Reported-by: Linux Kernel Functional Testing <[email protected]>
>>>>>>>>> Reported-by: Anders Roxell <[email protected]>
>>>>>>>> I tried to find something in that commit that would be different
>>>>>>>> in compat mode, but don't see anything at all -- this appears
>>>>>>>> to be just a simple refactoring of the code, unlike the commits
>>>>>>>> that immediately follow it and that do change the mount
>>>>>>>> interface.
>>>>>>>>
>>>>>>>> Unfortunately this makes it impossible to just revert the commit
>>>>>>>> on top of linux-next. Can you double-check your bisection by
>>>>>>>> testing 546694b8f658 and the commit before it again?
>>>>>>> I tried these two patches again:
>>>>>>> 546694b8f658 ("autofs: add autofs_parse_fd()") - doesn't boot
>>>>>>> bc69fdde0ae1 ("autofs: refactor autofs_prepare_pipe()") - boots
>>>>>>>
>>>>>> One difference that I notice between those two patches is that we no
>>>>>> long call autofs_prepare_pipe(). We just call autofs_check_pipe().
>>>>> Indeed, so some of the f_flags end up being different. I assumed
>>>>> this was done intentionally, but it might be worth checking if
>>>>> the patch below makes any difference when the flags get put
>>>>> back the way they were. This is probably not the correct fix, but
>>>>> may help figure out what is going on. It should apply to anything
>>>>> from 546694b8f658 ("autofs: add autofs_parse_fd()") to the current
>>>>> linux-next:
>>>>>
>>>>> --- a/fs/autofs/inode.c
>>>>> +++ b/fs/autofs/inode.c
>>>>> @@ -358,6 +358,11 @@ static int autofs_fill_super(struct super_block
>>>>> *s, struct fs_context *fc)
>>>>> pr_debug("pipe fd = %d, pgrp = %u\n",
>>>>> sbi->pipefd, pid_nr(sbi->oz_pgrp));
>>>>> + /* We want a packet pipe */
>>>>> + sbi->pipe->f_flags |= O_DIRECT;
>>>>> + /* We don't expect -EAGAIN */
>>>>> + sbi->pipe->f_flags &= ~O_NONBLOCK;
>>>>> +
>>>>
>>>> That makes sense, we do want a packet pipe and that does also mean
>>>>
>>>> we don't want a non-blocking pipe, it will be interesting to see
>>>>
>>>> if that makes a difference. It's been a long time since Linus
>>>>
>>>> implemented that packet pipe and I can't remember now what the
>>>>
>>>> case was that lead to it.
>>> After thinking about this over the weekend I'm pretty sure my mistake
>>>
>>> is dropping the call to autofs_prepare_pipe() without adding the tail
>>>
>>> end of it into autofs_parse_fd().
>>>
>>>
>>> To explain a bit of history which I'll include in the fix description.
>>>
>>> During autofs v5 development I decided to stay with the existing usage
>>>
>>> instead of changing to a packed structure for autofs <=> user space
>>>
>>> communications which turned out to be a mistake on my part.
>>>
>>>
>>> Problems arose and they were fixed by allowing for the 64 bit to 32 bit
>>>
>>> size difference in the automount(8) code.
>>>
>>>
>>> Along the way systemd started to use autofs and eventually encountered
>>>
>>> this problem too. systemd refused to compensate for the length difference
>>>
>>> insisting it be fixed in the kernel. Fortunately Linus implemented the
>>>
>>> packetized pipe which resolved the problem in a straight forward and
>>>
>>> simple way.
>>>
>>>
>>> So I pretty sure that the cause of the problem is the inadvertent
>>> dropping
>>>
>>> of the flags setting in autofs_fill_super() that Arnd spotted although I
>>>
>>> don't think putting it in autofs_fill_super() is the right thing to do.
>>>
>>>
>>> I'll produce a patch today which includes most of this explanation for
>>>
>>> future travelers ...
>> So I have a patch.
>>
>>
>> I'm of two minds whether to try and use the instructions to reproduce this
>>
>> or not because of experiences I have had with other similar testing
>> automation
>>
>> systems that claim to provide a reproducer and end up a huge waste of
>> time and
>>
>> are significantly frustrating.
>>
>>
>> Can someone please perform a test for me once I provide the patch?
> Just tested it, and it passed our tests. Added a tested by flag in your patch.
>
> Thanks for the prompt fix.

That's great to hear Anders, thanks for doing the testing, ;)


Ian