2022-01-20 14:33:24

by Petr Vorel

[permalink] [raw]
Subject: LTP nfslock01 test failing on NFS v3 (lockd: cannot monitor 10.0.0.2)

Hi all,

this is a test failure posted by Nikita Yushchenko [1]. LTP NFS test nfslock01
looks to be failing on NFS v3:

"not unsharing /var makes AF_UNIX socket for host's rpcbind to become available
inside ltpns. Then, at nfs3 mount time, kernel creates an instance of lockd for
ltpns, and ports for that instance leak to host's rpcbind and overwrite ports
for lockd already active for root namespace. This breaks nfs3 file locking."

This error has been hidden, showing only with extra patch from Nikita [2].
Because the patch has not been merged, in case you want to verify yourself,
feel free to use my LTP fork branch nfs_flock/fail-on-error to get this patch +
strace debugging [3]:

# PATH="/opt/ltp/testcases/bin:$PATH" /opt/ltp/testcases/bin/nfslock01 -t tcp -v 3
...
nfslock01 1 TINFO: initialize 'lhost' 'ltp_ns_veth2' interface
nfslock01 1 TINFO: add local addr 10.0.0.2/24
nfslock01 1 TINFO: add local addr fd00:1:1:1::2/64
nfslock01 1 TINFO: initialize 'rhost' 'ltp_ns_veth1' interface
nfslock01 1 TINFO: add remote addr 10.0.0.1/24
nfslock01 1 TINFO: add remote addr fd00:1:1:1::1/64
nfslock01 1 TINFO: Network config (local -- remote):
nfslock01 1 TINFO: ltp_ns_veth2 -- ltp_ns_veth1
nfslock01 1 TINFO: 10.0.0.2/24 -- 10.0.0.1/24
nfslock01 1 TINFO: fd00:1:1:1::2/64 -- fd00:1:1:1::1/64
nfslock01 1 TINFO: timeout per run is 0h 5m 0s
nfslock01 1 TINFO: setup NFSv3, socket type tcp
nfslock01 1 TINFO: Mounting NFS: mount -v -t nfs -o proto=tcp,vers=3 10.0.0.2:/tmp/LTP_nfslock01.PAYCDFih75/3/tcp /tmp/LTP_nfslock01.PAYCDFih75/3/0
nfslock01 1 TINFO: creating test files
nfslock01 1 TINFO: Testing locking
nfslock01 1 TINFO: locking 'flock_idata' file and writing data
nfslock01 1 TINFO: waiting for pids: 2022 2023
execve("/opt/ltp/testcases/bin/nfs_flock", ["nfs_flock", "0", "flock_idata"], 0x7ffd4dae5880 /* 206 vars */execve("/opt/ltp/testcases/bin/nfs_flock", ["nfs_flock", "1", "flock_idata"], 0x7ffee8d52690 /* 206 vars */) = 0
brk(NULL) = 0x555ad67cc000
...
openat(AT_FDCWD, "flock_idata", O_RDWR) = 3
) = 3
fcntl(3, F_SETLKW, {l_type=F_WRLCK, l_whence=SEEK_SET, l_start=64, l_len=64}fcntl(3, F_SETLKW, {l_type=F_WRLCK, l_whence=SEEK_SET, l_start=0, l_len=64}) = -1 ENOLCK (No locks available)
newfstatat(1, "", {st_mode=S_IFCHR|0600, st_rdev=makedev(0x88, 0x1), ...}, AT_EMPTY_PATH) = 0
brk(NULL) = 0x55aefc2d5000
brk(0x55aefc2f6000) = 0x55aefc2f6000
write(1, "failed in writeb_lock, Errno = 3"..., 34failed in writeb_lock, Errno = 37
) = 34
exit_group(1) = ?
+++ exited with 1 +++
) = -1 ENOLCK (No locks available)
newfstatat(1, "", {st_mode=S_IFCHR|0600, st_rdev=makedev(0x88, 0x1), ...}, AT_EMPTY_PATH) = 0
brk(NULL) = 0x555ad67cc000
brk(0x555ad67ed000) = 0x555ad67ed000
write(1, "failed in writeb_lock, Errno = 3"..., 34failed in writeb_lock, Errno = 37
) = 34
exit_group(1) = ?
+++ exited with 1 +++
nfslock01 1 TFAIL: nfs_lock process failed
...

Dmesg shows: "lockd: cannot monitor 10.0.0.2", test fails on
fcntl(fd, F_SETLKW, &lock), lock.l_whence is SEEK_SET.

Running other NFS versions (-v 4 or -v 4.1 or -v 4.2) works ok.
Also tested only on TCP due UDP being recently disabled by default.

I found this behaviour on various kernels (openSUSE 5.16, Debian: 5.16, 5.10,
SLES 5.14 and 5.3 - both heavily patched).

Is it a bug in lockd or in a test? Is there some limitation on v3?

Kind regards,
Petr


2022-01-20 14:51:46

by Nikita Yushchenko

[permalink] [raw]
Subject: Re: LTP nfslock01 test failing on NFS v3 (lockd: cannot monitor 10.0.0.2)

18.01.2022 18:26, Petr Vorel wrote:
> Hi all,
>
> this is a test failure posted by Nikita Yushchenko [1]. LTP NFS test nfslock01
> looks to be failing on NFS v3:
>
> "not unsharing /var makes AF_UNIX socket for host's rpcbind to become available
> inside ltpns. Then, at nfs3 mount time, kernel creates an instance of lockd for
> ltpns, and ports for that instance leak to host's rpcbind and overwrite ports
> for lockd already active for root namespace. This breaks nfs3 file locking."

What exactly happens is:

Test runs 'mount' in non-root netns, trying to mount a directory from root netns of the same host via nfsv3

(Part of) call chain inside the kernel

nfs_try_get_tree()
nfs3_create_server()
nfs_create_server()
nfs_init_server()
nfs_start_lockd()
nlmclnt_init()
lockd_up()
svc_bind()
svc_rpcb_setup()
rpcb_create_local()

... and at this point it tries AF_UNIX connection to /var/run/rpcbind.sock

AF_UNIX is not netns-aware.
So it connects to host's rpcbind.
And overwrites ports registered in host's rpcbind by lockd instance for root namespace. Since this
point, lockd instance for root namespace becomes no longer accessible (it still listens but nobody can
learn the ports). Thus nfs locks don't work.

I'm not sure what is the correct behavior here.

Maybe rpcb_create_local() shall detect that it is not in root netns, and only try AF_INET connection to
localhost in that case.

Maybe it shall not try AF_UNIX at all. Are there any realistic cases when rpcbind is accessible via
AF_UNIX only?

Nikita

2022-01-21 09:38:47

by NeilBrown

[permalink] [raw]
Subject: Re: LTP nfslock01 test failing on NFS v3 (lockd: cannot monitor 10.0.0.2)

On Wed, 19 Jan 2022, Petr Vorel wrote:
> Hi all,
>
> this is a test failure posted by Nikita Yushchenko [1]. LTP NFS test nfslock01
> looks to be failing on NFS v3:
>
> "not unsharing /var makes AF_UNIX socket for host's rpcbind to become available
> inside ltpns. Then, at nfs3 mount time, kernel creates an instance of lockd for
> ltpns, and ports for that instance leak to host's rpcbind and overwrite ports
> for lockd already active for root namespace. This breaks nfs3 file locking."

"not unsharing /var" .... can this be fixed by simply unsharing /var?
Or is that not simple?

On could easily argue that RPCBIND_SOCK_PATHNAME in the kernel should be
changed to "/run/rpcbind.sock". Does this test suite unshare /run ??

BTW, your email contains [1], [2], etc which suggests there are links
somewhere - but there aren't.

NeilBrown

2022-01-21 10:06:54

by NeilBrown

[permalink] [raw]
Subject: Re: LTP nfslock01 test failing on NFS v3 (lockd: cannot monitor 10.0.0.2)

On Wed, 19 Jan 2022, Nikita Yushchenko wrote:
> 18.01.2022 18:26, Petr Vorel wrote:
> > Hi all,
> >
> > this is a test failure posted by Nikita Yushchenko [1]. LTP NFS test nfslock01
> > looks to be failing on NFS v3:
> >
> > "not unsharing /var makes AF_UNIX socket for host's rpcbind to become available
> > inside ltpns. Then, at nfs3 mount time, kernel creates an instance of lockd for
> > ltpns, and ports for that instance leak to host's rpcbind and overwrite ports
> > for lockd already active for root namespace. This breaks nfs3 file locking."
>
> What exactly happens is:
>
> Test runs 'mount' in non-root netns, trying to mount a directory from root netns of the same host via nfsv3
>
> (Part of) call chain inside the kernel
>
> nfs_try_get_tree()
> nfs3_create_server()
> nfs_create_server()
> nfs_init_server()
> nfs_start_lockd()
> nlmclnt_init()
> lockd_up()
> svc_bind()
> svc_rpcb_setup()
> rpcb_create_local()
>
> ... and at this point it tries AF_UNIX connection to /var/run/rpcbind.sock
>
> AF_UNIX is not netns-aware.
> So it connects to host's rpcbind.
> And overwrites ports registered in host's rpcbind by lockd instance for root namespace. Since this
> point, lockd instance for root namespace becomes no longer accessible (it still listens but nobody can
> learn the ports). Thus nfs locks don't work.
>
> I'm not sure what is the correct behavior here.
>
> Maybe rpcb_create_local() shall detect that it is not in root netns, and only try AF_INET connection to
> localhost in that case.

That would be simple and might be sensible. IF changing the AF_UNIX
path to "/run/rpcbind.sock" isn't sufficient, then testing for the
root_ns is probably the best second option.

Thanks,
NeilBrown

2022-01-21 16:39:16

by Nikita Yushchenko

[permalink] [raw]
Subject: Re: LTP nfslock01 test failing on NFS v3 (lockd: cannot monitor 10.0.0.2)

19.01.2022 01:11, NeilBrown wrote:
> On Wed, 19 Jan 2022, Petr Vorel wrote:
>> Hi all,
>>
>> this is a test failure posted by Nikita Yushchenko [1]. LTP NFS test nfslock01
>> looks to be failing on NFS v3:
>>
>> "not unsharing /var makes AF_UNIX socket for host's rpcbind to become available
>> inside ltpns. Then, at nfs3 mount time, kernel creates an instance of lockd for
>> ltpns, and ports for that instance leak to host's rpcbind and overwrite ports
>> for lockd already active for root namespace. This breaks nfs3 file locking."
>
> "not unsharing /var" .... can this be fixed by simply unsharing /var?
> Or is that not simple?

Big picture is - lockd tries to be per-netns, but lockd isn't standalone, it depends on rpcbind, and
rpcbind isn't guaranteed to be per-netns.

One can argue that it is not kernel's job to provide per-netns rpcbind.

Still, the current situation is - by default, doing an nfs mount from within netns B immediately breaks
lockd serving nfs mounts exported from different netns A. "By default" = "as long as nfsmount process
executed in netns B is also in a different mount namespace that has RPCBIND_SOCK_PATHNAME not pointing
to AF_UNIX socket instance owned by rpcbind serving netns A.

Although in LTP's 'nfslock01' test the "non working locking" is reproduced on the same mount that
triggered the breakage, the breakage is not limited to that mount. Since that mount operation in netns
B, any client of nfs exports from netns A will get locking broken - including clients running on
different physical hosts.

I'd say that using AF_UNIX connection from lockd to rpcbind does not play well with per-netns lockd.

Solution to use AF_UNIX connection to rpcbind only for lockd serving root netns, and using AF_INET
otherwise - looks more sane.

> On could easily argue that RPCBIND_SOCK_PATHNAME in the kernel should be
> changed to "/run/rpcbind.sock".

It may be a better idea to make it configurable per-netns.

Nikita

2022-01-21 16:55:46

by Nikita Yushchenko

[permalink] [raw]
Subject: Re: LTP nfslock01 test failing on NFS v3 (lockd: cannot monitor 10.0.0.2)

> Big picture is - lockd tries to be per-netns, but lockd isn't standalone, it depends on rpcbind, and
> rpcbind isn't guaranteed to be per-netns.
>
> One can argue that it is not kernel's job to provide per-netns rpcbind.
>
> Still, the current situation is - by default, doing an nfs mount from within netns B immediately breaks
> lockd serving nfs mounts exported from different netns A. "By default" = "as long as nfsmount process
> executed in netns B is also in a different mount namespace that has RPCBIND_SOCK_PATHNAME not pointing
> to AF_UNIX socket instance owned by rpcbind serving netns A.
>
> Although in LTP's 'nfslock01' test the "non working locking" is reproduced on the same mount that
> triggered the breakage, the breakage is not limited to that mount. Since that mount operation in netns
> B, any client of nfs exports from netns A will get locking broken - including clients running on
> different physical hosts.
>
> I'd say that using AF_UNIX connection from lockd to rpcbind does not play well with per-netns lockd.
>
> Solution to use AF_UNIX connection to rpcbind only for lockd serving root netns, and using AF_INET
> otherwise - looks more sane.

Btw, not sure (did not test) what will happen if nfs server will be similarly started in netns B. Will
it hijack requests addressed to nfs server running in netns A?

Nikita

2022-01-21 16:55:59

by Nikita Yushchenko

[permalink] [raw]
Subject: Re: LTP nfslock01 test failing on NFS v3 (lockd: cannot monitor 10.0.0.2)

19.01.2022 08:26, Nikita Yushchenko wrote:
>> Big picture is - lockd tries to be per-netns, but lockd isn't standalone, it depends on rpcbind, and
>> rpcbind isn't guaranteed to be per-netns.
>>
>> One can argue that it is not kernel's job to provide per-netns rpcbind.
>>
>> Still, the current situation is - by default, doing an nfs mount from within netns B immediately
>> breaks lockd serving nfs mounts exported from different netns A. "By default" = "as long as nfsmount
>> process executed in netns B is also in a different mount namespace that has RPCBIND_SOCK_PATHNAME not
>> pointing to AF_UNIX socket instance owned by rpcbind serving netns A.
>>
>> Although in LTP's 'nfslock01' test the "non working locking" is reproduced on the same mount that
>> triggered the breakage, the breakage is not limited to that mount. Since that mount operation in netns
>> B, any client of nfs exports from netns A will get locking broken - including clients running on
>> different physical hosts.
>>
>> I'd say that using AF_UNIX connection from lockd to rpcbind does not play well with per-netns lockd.
>>
>> Solution to use AF_UNIX connection to rpcbind only for lockd serving root netns, and using AF_INET
>> otherwise - looks more sane.
>
> Btw, not sure (did not test) what will happen if nfs server will be similarly started in netns B.  Will
> it hijack requests addressed to nfs server running in netns A?

No it won't "hijack"... because in will still listen inside netns B only. But, if ports in rpcbind get
overwritten in the similar manner, nfs server running in netns A will become no longer reachable.

2022-01-21 21:58:26

by Petr Vorel

[permalink] [raw]
Subject: Re: LTP nfslock01 test failing on NFS v3 (lockd: cannot monitor 10.0.0.2)

Hi Neil, all,

> On Wed, 19 Jan 2022, Petr Vorel wrote:
> > Hi all,

> > this is a test failure posted by Nikita Yushchenko [1]. LTP NFS test nfslock01
> > looks to be failing on NFS v3:

> > "not unsharing /var makes AF_UNIX socket for host's rpcbind to become available
> > inside ltpns. Then, at nfs3 mount time, kernel creates an instance of lockd for
> > ltpns, and ports for that instance leak to host's rpcbind and overwrite ports
> > for lockd already active for root namespace. This breaks nfs3 file locking."

> "not unsharing /var" .... can this be fixed by simply unsharing /var?
> Or is that not simple?

> On could easily argue that RPCBIND_SOCK_PATHNAME in the kernel should be
> changed to "/run/rpcbind.sock". Does this test suite unshare /run ??

> BTW, your email contains [1], [2], etc which suggests there are links
> somewhere - but there aren't.
I'm sorry, here they are:

[1] https://lore.kernel.org/ltp/[email protected]/
(the report)

[2] https://lore.kernel.org/ltp/[email protected]/
(the not yet merged LTP Nikita's patch)

[3] https://github.com/pevik/ltp/commits/nfs_flock/fail-on-error
(my LTP fork with Nikita's patch [2] + strace debugging - with this code I post
the report)

Kind regards,
Petr

> NeilBrown