2020-01-16 11:35:05

by Krzysztof Kozlowski

[permalink] [raw]
Subject: [BISECT BUG] NFS v4 root not working after 6d972518b821 ("NFS: Add fs_context support.")

Hi all,

Bisect pointed to 6d972518b821 ("NFS: Add fs_context support.") for
failures of mounting NFS v4 root on my boards:
mount.nfs4 -o vers=4,nolock 192.168.1.10:/srv/nfs/odroidhc1 /new_root
[ 24.980839] NFS4: Couldn't follow remote path
[ 24.986201] NFS: Value for 'minorversion' out of range
mount.nfs4: Numerical result out of range

https://krzk.eu/#/builders/21/builds/1692
Full console log:
https://krzk.eu/#/builders/21/builds/1692/steps/14/logs/serial0

Enabling NFS v4.1 in defconfig seems to help. I can send patches for
this (for defconfigs) but probably the root cause should be fixed as
well.

Environment:
1. Arch ARM Linux
2. exynos_defconfig
3. Exynos boards (Odroid XU3, etc), ARMv7, octa-core (Cortex-A7+A15),
Exynos5422 SoC
4. systemd, boot up with static IP set in kernel command line
5. No swap
6. Kernel, DTB and initramfs are downloaded with TFTP
7. NFS root from NFSv4 server

Let me know if you need more details.

Best regards,
Krzysztof


2020-01-17 11:53:31

by Krzysztof Kozlowski

[permalink] [raw]
Subject: Re: [BISECT BUG] NFS v4 root not working after 6d972518b821 ("NFS: Add fs_context support.")

On Thu, Jan 16, 2020 at 07:49:15PM -0500, Scott Mayhew wrote:
> On Thu, 16 Jan 2020, Krzysztof Kozlowski wrote:
>
> > Hi all,
> >
> > Bisect pointed to 6d972518b821 ("NFS: Add fs_context support.") for
> > failures of mounting NFS v4 root on my boards:
> > mount.nfs4 -o vers=4,nolock 192.168.1.10:/srv/nfs/odroidhc1 /new_root
> > [ 24.980839] NFS4: Couldn't follow remote path
> > [ 24.986201] NFS: Value for 'minorversion' out of range
> > mount.nfs4: Numerical result out of range
> >
> > https://krzk.eu/#/builders/21/builds/1692
> > Full console log:
> > https://krzk.eu/#/builders/21/builds/1692/steps/14/logs/serial0
> >
> > Enabling NFS v4.1 in defconfig seems to help. I can send patches for
> > this (for defconfigs) but probably the root cause should be fixed as
> > well.
> >
> > Environment:
> > 1. Arch ARM Linux
> > 2. exynos_defconfig
> > 3. Exynos boards (Odroid XU3, etc), ARMv7, octa-core (Cortex-A7+A15),
> > Exynos5422 SoC
> > 4. systemd, boot up with static IP set in kernel command line
> > 5. No swap
> > 6. Kernel, DTB and initramfs are downloaded with TFTP
> > 7. NFS root from NFSv4 server
> >
> > Let me know if you need more details.
>
> I haven't had much luck reproducing this. I disabled v4.1 in my .config
> and I can still boot a VM with NFS root (granted, I don't really use NFS
> root so this setup is brand new and pretty basic):
>
> [root@localhost ~]# cat /proc/cmdline
> BOOT_IMAGE=mountapi/vmlinuz initrd=mountapi/initrd.img ip=dhcp selinux=0 console=tty0 console=ttyS0,115200 root=nfs4:192.168.122.3:/export/nfsroot/fedora31
>
> [root@localhost ~]# grep nfs /proc/mounts
> 192.168.122.3:/export/nfsroot/fedora31 / nfs rw,relatime,vers=4.0,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,clientaddr=192.168.122.69,local_lock=none,addr=192.168.122.3 0 0
>
> Just out of curiousity, what version of the mount.nfs program do you
> have in your initramfs? I'm wondering if it's maybe passing the mount
> options differently than mine. FWIW I'm using version 2.4.2:
>
> [smayhew@aion tmp]$ lsinitrd /var/lib/tftpboot/mountapi/initrd.img|grep mount.nfs
> -rwsr-xr-x 1 root root 208600 Feb 14 2019 usr/sbin/mount.nfs
> lrwxrwxrwx 1 root root 9 Feb 14 2019 usr/sbin/mount.nfs4 -> mount.nfs
> [smayhew@aion tmp]$ /usr/lib/dracut/skipcpio /var/lib/tftpboot/mountapi/initrd.img|zcat|cpio -id usr/sbin/mount.nfs
> 256163 blocks
> [smayhew@aion tmp]$ ./usr/sbin/mount.nfs -V
> mount.nfs: (linux nfs-utils 2.4.2)
> [smayhew@aion tmp]$
>

My binary is:
mount.nfs4: (linux nfs-utils 3.1.1)

This is pretty weird... I extracted this binary from a running system
(Arch Linux Arm) and put into the initramfs. However now my Arch Linux
is shipped with v2.4.2-1...

Best regards,
Krzysztof

2020-01-17 13:17:25

by Krzysztof Kozlowski

[permalink] [raw]
Subject: Re: [BISECT BUG] NFS v4 root not working after 6d972518b821 ("NFS: Add fs_context support.")

On Fri, Jan 17, 2020 at 12:54:34PM +0000, David Howells wrote:
> Which git tree/branch are you using?
>
> David

The report was from linux-next. The binary is from regular system Arch
Linux Arm... but now I wonder how did I get v3.1 as all my systems
recently have v2.4 or v2.3...

I can replace the binary with v2.4 and try again, although kernel should
probably not behave differently. So far it was working fine.

Best regards,
Krzysztof

2020-01-17 14:09:41

by David Howells

[permalink] [raw]
Subject: Re: [BISECT BUG] NFS v4 root not working after 6d972518b821 ("NFS: Add fs_context support.")

Can you do:

grep NFS .config

for your kernel config?

Thanks,
David

2020-01-17 14:20:49

by David Howells

[permalink] [raw]
Subject: Re: [BISECT BUG] NFS v4 root not working after 6d972518b821 ("NFS: Add fs_context support.")

You seem to be running afoul of the check here:

case Opt_minorversion:
if (result.uint_32 > NFS4_MAX_MINOR_VERSION)
goto out_of_bounds;
ctx->minorversion = result.uint_32;
break;

which would seem to indicate that the mount process is supplying
minorversion=X as an option. Can you modify your kernel to print param->key
and param->string at the top of nfs_fs_context_parse_param()? Adding
something like:

pr_notice("NFSOP '%s=%s'\n", param->key, param->string);

will likely suffice unless you're directly driving the new mount API - in
which case param->string might be things other than a string, but that's
unlikely. It might also be NULL, but printk should handle that.

David

2020-01-17 14:36:46

by Krzysztof Kozlowski

[permalink] [raw]
Subject: Re: [BISECT BUG] NFS v4 root not working after 6d972518b821 ("NFS: Add fs_context support.")

On Fri, Jan 17, 2020 at 02:08:55PM +0000, David Howells wrote:
> Can you do:
>
> grep NFS .config
>
> for your kernel config?

It is a regular exynos_defconfig from the same tree (so linux-next).

Best regards,
Krzysztof

2020-01-17 14:41:13

by Krzysztof Kozlowski

[permalink] [raw]
Subject: Re: [BISECT BUG] NFS v4 root not working after 6d972518b821 ("NFS: Add fs_context support.")

On Fri, Jan 17, 2020 at 02:20:03PM +0000, David Howells wrote:
> You seem to be running afoul of the check here:
>
> case Opt_minorversion:
> if (result.uint_32 > NFS4_MAX_MINOR_VERSION)
> goto out_of_bounds;
> ctx->minorversion = result.uint_32;
> break;
>
> which would seem to indicate that the mount process is supplying
> minorversion=X as an option. Can you modify your kernel to print param->key
> and param->string at the top of nfs_fs_context_parse_param()? Adding
> something like:
>
> pr_notice("NFSOP '%s=%s'\n", param->key, param->string);
>
> will likely suffice unless you're directly driving the new mount API - in
> which case param->string might be things other than a string, but that's
> unlikely. It might also be NULL, but printk should handle that.

The output:

NFS-Mount: 192.168.1.10:/srv/nfs/odroidhc1
Waiting 10 seconds for device /dev/nfs ...
[ 14.652366] random: crng init done
Mount cmd:
mount.nfs4 -o vers=4,nolock 192.168.1.10:/srv/nfs/odroidhc1 /new_root
[ 22.938314] NFSOP 'source=192.168.1.10:/srv/nfs/odroidhc1'
[ 22.942638] NFSOP 'nolock=(null)'
[ 22.945772] NFSOP 'vers=4.2'
[ 22.948660] NFSOP 'addr=192.168.1.10'
[ 22.952350] NFSOP 'clientaddr=192.168.1.12'
[ 22.956831] NFS4: Couldn't follow remote path
[ 22.971001] NFSOP 'source=192.168.1.10:/srv/nfs/odroidhc1'
[ 22.975217] NFSOP 'nolock=(null)'
[ 22.978444] NFSOP 'vers=4'
[ 22.981265] NFSOP 'minorversion=1'
[ 22.984513] NFS: Value for 'minorversion' out of range
mount.nfs4: Numerical result out of range
:: running cleanup hook [udev]
ERROR: Failed to mount the real root device.
Bailing out, you are on your own. Good luck.

sh: can't access tty; job control turned off

Best regards,
Krzysztof

2020-01-17 15:13:08

by David Howells

[permalink] [raw]
Subject: Re: [BISECT BUG] NFS v4 root not working after 6d972518b821 ("NFS: Add fs_context support.")

Krzysztof Kozlowski <[email protected]> wrote:

> mount.nfs4 -o vers=4,nolock 192.168.1.10:/srv/nfs/odroidhc1 /new_root

Okay, it looks like the mount command makes two attempts at mounting.
Firstly, it does this:

> [ 22.938314] NFSOP 'source=192.168.1.10:/srv/nfs/odroidhc1'
> [ 22.942638] NFSOP 'nolock=(null)'
> [ 22.945772] NFSOP 'vers=4.2'
> [ 22.948660] NFSOP 'addr=192.168.1.10'
> [ 22.952350] NFSOP 'clientaddr=192.168.1.12'
> [ 22.956831] NFS4: Couldn't follow remote path

Which accepts the "vers=4.2" parameter as there's no check that that is
actually valid given the configuration, but then fails later. Secondly, it
does this:

> [ 22.971001] NFSOP 'source=192.168.1.10:/srv/nfs/odroidhc1'
> [ 22.975217] NFSOP 'nolock=(null)'
> [ 22.978444] NFSOP 'vers=4'
> [ 22.981265] NFSOP 'minorversion=1'
> [ 22.984513] NFS: Value for 'minorversion' out of range
> mount.nfs4: Numerical result out of range

which fails because of the minorversion=1 specification, where the kernel
config didn't enable NFS_V4_1.

It looks like it ought to have failed prior to these patches in the same way:

case Opt_minorversion:
if (nfs_get_option_ul(args, &option))
goto out_invalid_value;
if (option > NFS4_MAX_MINOR_VERSION)
goto out_invalid_value;
mnt->minorversion = option;
break;

David

2020-01-17 15:17:44

by Trond Myklebust

[permalink] [raw]
Subject: Re: [BISECT BUG] NFS v4 root not working after 6d972518b821 ("NFS: Add fs_context support.")

On Fri, 2020-01-17 at 15:12 +0000, David Howells wrote:
> Krzysztof Kozlowski <[email protected]> wrote:
>
> > mount.nfs4 -o vers=4,nolock 192.168.1.10:/srv/nfs/odroidhc1
> > /new_root
>
> Okay, it looks like the mount command makes two attempts at mounting.
> Firstly, it does this:
>
> > [ 22.938314] NFSOP 'source=192.168.1.10:/srv/nfs/odroidhc1'
> > [ 22.942638] NFSOP 'nolock=(null)'
> > [ 22.945772] NFSOP 'vers=4.2'
> > [ 22.948660] NFSOP 'addr=192.168.1.10'
> > [ 22.952350] NFSOP 'clientaddr=192.168.1.12'
> > [ 22.956831] NFS4: Couldn't follow remote path
>
> Which accepts the "vers=4.2" parameter as there's no check that that
> is
> actually valid given the configuration, but then fails
> later. Secondly, it
> does this:
>
> > [ 22.971001] NFSOP 'source=192.168.1.10:/srv/nfs/odroidhc1'
> > [ 22.975217] NFSOP 'nolock=(null)'
> > [ 22.978444] NFSOP 'vers=4'
> > [ 22.981265] NFSOP 'minorversion=1'
> > [ 22.984513] NFS: Value for 'minorversion' out of range
> > mount.nfs4: Numerical result out of range
>
> which fails because of the minorversion=1 specification, where the
> kernel
> config didn't enable NFS_V4_1.
>
> It looks like it ought to have failed prior to these patches in the
> same way:
>
> case Opt_minorversion:
> if (nfs_get_option_ul(args, &option))
> goto out_invalid_value;
> if (option > NFS4_MAX_MINOR_VERSION)
> goto out_invalid_value;
> mnt->minorversion = option;
> break;
>

It looks like someone changed the return value from the old EINVAL to
something else? The "Numerical result out of range" message above
suggests it has been changed to EOVERFLOW, which probably is not
supported by 'mount'.

--
Trond Myklebust
Linux NFS client maintainer, Hammerspace
[email protected]


2020-01-17 15:48:44

by David Howells

[permalink] [raw]
Subject: [PATCH] nfs: Return EINVAL rather than ERANGE for mount parse errors

Hi Krzysztof,

Does this patch fix the problem?

David
---
commit 3021f58ee1e2c9659e629d0ccf06d3e0876e805a
Author: David Howells <[email protected]>
Date: Fri Jan 17 15:37:46 2020 +0000

nfs: Return EINVAL rather than ERANGE for mount parse errors

Return EINVAL rather than ERANGE for mount parse errors as the userspace
mount command doesn't necessarily understand what to do with anything other
than EINVAL.

The old code returned -ERANGE as an intermediate error that then get
converted to -EINVAL, whereas the new code returns -ERANGE.

This was induced by passing minorversion=1 to a v4 mount where
CONFIG_NFS_V4_1 was disabled in the kernel build.

Fixes: 68f65ef40e1e ("NFS: Convert mount option parsing to use functionality from fs_parser.h")
Reported-by: Krzysztof Kozlowski <[email protected]>
Signed-off-by: David Howells <[email protected]>

diff --git a/fs/nfs/fs_context.c b/fs/nfs/fs_context.c
index 429315c011ae..07cbd655dafb 100644
--- a/fs/nfs/fs_context.c
+++ b/fs/nfs/fs_context.c
@@ -770,7 +770,7 @@ static int nfs_fs_context_parse_param(struct fs_context *fc,
return nfs_invalf(fc, "NFS: Bad IP address specified");
out_of_bounds:
nfs_invalf(fc, "NFS: Value for '%s' out of range", param->key);
- return -ERANGE;
+ return -EINVAL;
}

/*

2020-01-17 15:55:35

by David Howells

[permalink] [raw]
Subject: [PATCH v2] nfs: Return EINVAL rather than ERANGE for mount parse errors

commit b9423c912b770e5b9e4228d90da92b6a69693d8e
Author: David Howells <[email protected]>
Date: Fri Jan 17 15:37:46 2020 +0000

nfs: Return EINVAL rather than ERANGE for mount parse errors

Return EINVAL rather than ERANGE for mount parse errors as the userspace
mount command doesn't necessarily understand what to do with anything other
than EINVAL.

The old code returned -ERANGE as an intermediate error that then get
converted to -EINVAL, whereas the new code returns -ERANGE.

This was induced by passing minorversion=1 to a v4 mount where
CONFIG_NFS_V4_1 was disabled in the kernel build.

Fixes: 68f65ef40e1e ("NFS: Convert mount option parsing to use functionality from fs_parser.h")
Reported-by: Krzysztof Kozlowski <[email protected]>
Signed-off-by: David Howells <[email protected]>

diff --git a/fs/nfs/fs_context.c b/fs/nfs/fs_context.c
index 429315c011ae..74508ed9aeec 100644
--- a/fs/nfs/fs_context.c
+++ b/fs/nfs/fs_context.c
@@ -769,8 +769,7 @@ static int nfs_fs_context_parse_param(struct fs_context *fc,
out_invalid_address:
return nfs_invalf(fc, "NFS: Bad IP address specified");
out_of_bounds:
- nfs_invalf(fc, "NFS: Value for '%s' out of range", param->key);
- return -ERANGE;
+ return nfs_invalf(fc, "NFS: Value for '%s' out of range", param->key);
}

/*

2020-01-17 16:52:02

by Krzysztof Kozlowski

[permalink] [raw]
Subject: Re: [PATCH v2] nfs: Return EINVAL rather than ERANGE for mount parse errors

On Fri, Jan 17, 2020 at 03:55:09PM +0000, David Howells wrote:
> commit b9423c912b770e5b9e4228d90da92b6a69693d8e
> Author: David Howells <[email protected]>
> Date: Fri Jan 17 15:37:46 2020 +0000
>
> nfs: Return EINVAL rather than ERANGE for mount parse errors
>
> Return EINVAL rather than ERANGE for mount parse errors as the userspace
> mount command doesn't necessarily understand what to do with anything other
> than EINVAL.
>
> The old code returned -ERANGE as an intermediate error that then get
> converted to -EINVAL, whereas the new code returns -ERANGE.
>
> This was induced by passing minorversion=1 to a v4 mount where
> CONFIG_NFS_V4_1 was disabled in the kernel build.
>
> Fixes: 68f65ef40e1e ("NFS: Convert mount option parsing to use functionality from fs_parser.h")
> Reported-by: Krzysztof Kozlowski <[email protected]>
> Signed-off-by: David Howells <[email protected]>
>
> diff --git a/fs/nfs/fs_context.c b/fs/nfs/fs_context.c
> index 429315c011ae..74508ed9aeec 100644
> --- a/fs/nfs/fs_context.c
> +++ b/fs/nfs/fs_context.c
> @@ -769,8 +769,7 @@ static int nfs_fs_context_parse_param(struct fs_context *fc,
> out_invalid_address:
> return nfs_invalf(fc, "NFS: Bad IP address specified");
> out_of_bounds:
> - nfs_invalf(fc, "NFS: Value for '%s' out of range", param->key);
> - return -ERANGE;
> + return nfs_invalf(fc, "NFS: Value for '%s' out of range", param->key);
> }
>
> /*

Yes, the boards boots up, thanks!

Tested-by: Krzysztof Kozlowski <[email protected]>

I did not run extensive tests but few boots show also 2-3 seconds faster
mount of NFS root (faster switch from initramfs to proper user-space
from NFS).

Best regards,
Krzysztof

2020-01-17 17:19:19

by David Howells

[permalink] [raw]
Subject: Re: [PATCH v2] nfs: Return EINVAL rather than ERANGE for mount parse errors

Hi Anna,

Can you pick this patch up and add it to your branch?

Thanks,
David

2020-01-17 20:21:56

by Anna Schumaker

[permalink] [raw]
Subject: Re: [PATCH v2] nfs: Return EINVAL rather than ERANGE for mount parse errors

On Fri, 2020-01-17 at 17:18 +0000, David Howells wrote:
> Hi Anna,
>
> Can you pick this patch up and add it to your branch?

Sure! I have it applied on my laptop now, and I'll push it out before I sign off
for the weekend.

Thanks for fixing it so quickly!
Anna

>
> Thanks,
> David
>

2020-01-17 21:12:27

by David Howells

[permalink] [raw]
Subject: Re: [PATCH v2] nfs: Return EINVAL rather than ERANGE for mount parse errors

Schumaker, Anna <[email protected]> wrote:

> Sure! I have it applied on my laptop now, and I'll push it out before I sign
> off for the weekend.

Ta!

David