2023-05-10 22:09:13

by NeilBrown

[permalink] [raw]
Subject: [PATCH 0/2] Support abstract address for rpcbind in kernel

These two patches cause the SUNRPC layer in Linux to attempt to contact
rpcbind using an AF_UNIX socket with an abstract address before
the existing attempts of AF_UNIX to a socket in the filesystem, and IP
to a well known port.

This allows the benefits of an AF_UNIX connection combined with the
benefits of honouring the network namespace when connection rpcbind.

For this to be useful, rpcbind must listed on that name, and user-space
tools must also connect to the same address. This requires changes to
rpcbind and too libtirpc. libtirpc currently has a bug which causes
sockets bountd to abstract addresses to appear to be unbound, so asking
systemd to pass rpcbind an abstract socket doesn't work - rpcbind
rejects it.

Patches for rpcbind and libtirpc will follow.

NeilBrown


---

NeilBrown (2):
SUNRPC: support abstract unix socket addresses
SUNRPC: attempt to reach rpcbind with an abstract socket name


net/sunrpc/clnt.c | 8 ++++++--
net/sunrpc/rpcb_clnt.c | 39 +++++++++++++++++++++++++++++++--------
net/sunrpc/xprtsock.c | 9 +++++++--
3 files changed, 44 insertions(+), 12 deletions(-)

--
Signature



2023-05-10 22:09:32

by NeilBrown

[permalink] [raw]
Subject: [PATCH 2/2] SUNRPC: attempt to reach rpcbind with an abstract socket name

NFS is primarily name-spaced using network namespaces. However it
contacts rpcbind (and gss_proxy) using AF_UNIX sockets which are
name-spaced using the mount namespaces. This requires a container using
NFSv3 (the form that requires rpcbind) to manage both network and mount
namespaces, which can seem an unnecessary burden.

As NFS is primarily a network service it makes sense to use network
namespaces as much as possible, and to prefer to communicate with an
rpcbind running in the same network namespace. This can be done, while
preserving the benefits of AF_UNIX sockets, by using an abstract socket
address.

An abstract address has a nul at the start of sun_path, and a length
that is exactly the complete size of the sockaddr_un up to the end of
the name, NOT including any trailing nul (which is not part of the
address).
Abstract addresses are local to a network namespace - regular AF_UNIX
path names a resolved in the mount namespace ignoring the network
namespace.

This patch causes rpcb to first try an abstract address before
continuing with regular AF_UNIX and then IP addresses. This ensures
backwards compatibility.

Choosing the name needs some care as the same address will be configured
for rpcbind, and needs to be built in to libtirpc for this enhancement
to be fully successful. There is no formal standard for choosing
abstract addresses. The defacto standard appears to be to use a path
name similar to what would be used for a filesystem AF_UNIX address -
but with a leading nul.

In that case
"\0/var/run/rpcbind.sock"
seems like the best choice. However at this time /var/run is deprecated
in favour of /run, so
"\0/run/rpcbind.sock"
might be better.
Though as we are deliberately moving away from using the filesystem it
might seem more sensible to explicitly break the connection and just
have
"\0rpcbind.socket"
using the same name as the systemd unit file..

This patch chooses the second option, which seems least likely to raise
objections.

Signed-off-by: NeilBrown <[email protected]>
---
net/sunrpc/rpcb_clnt.c | 39 +++++++++++++++++++++++++++++++--------
1 file changed, 31 insertions(+), 8 deletions(-)

diff --git a/net/sunrpc/rpcb_clnt.c b/net/sunrpc/rpcb_clnt.c
index 5a8e6d46809a..a925165f4d0d 100644
--- a/net/sunrpc/rpcb_clnt.c
+++ b/net/sunrpc/rpcb_clnt.c
@@ -36,6 +36,7 @@
#include "netns.h"

#define RPCBIND_SOCK_PATHNAME "/var/run/rpcbind.sock"
+#define RPCBIND_SOCK_ABSTRACT_NAME "\0/run/rpcbind.sock"

#define RPCBIND_PROGRAM (100000u)
#define RPCBIND_PORT (111u)
@@ -216,21 +217,22 @@ static void rpcb_set_local(struct net *net, struct rpc_clnt *clnt,
sn->rpcb_users = 1;
}

+/* Evaluate to actual length of the `sockaddr_un' structure. */
+# define SUN_LEN(ptr) (offsetof(struct sockaddr_un, sun_path) \
+ + 1 + strlen((ptr)->sun_path + 1))
+
/*
* Returns zero on success, otherwise a negative errno value
* is returned.
*/
-static int rpcb_create_local_unix(struct net *net)
+static int rpcb_create_af_local(struct net *net,
+ const struct sockaddr_un *addr)
{
- static const struct sockaddr_un rpcb_localaddr_rpcbind = {
- .sun_family = AF_LOCAL,
- .sun_path = RPCBIND_SOCK_PATHNAME,
- };
struct rpc_create_args args = {
.net = net,
.protocol = XPRT_TRANSPORT_LOCAL,
- .address = (struct sockaddr *)&rpcb_localaddr_rpcbind,
- .addrsize = sizeof(rpcb_localaddr_rpcbind),
+ .address = (struct sockaddr *)addr,
+ .addrsize = SUN_LEN(addr),
.servername = "localhost",
.program = &rpcb_program,
.version = RPCBVERS_2,
@@ -269,6 +271,26 @@ static int rpcb_create_local_unix(struct net *net)
return result;
}

+static int rpcb_create_local_abstract(struct net *net)
+{
+ static const struct sockaddr_un rpcb_localaddr_abstract = {
+ .sun_family = AF_LOCAL,
+ .sun_path = RPCBIND_SOCK_ABSTRACT_NAME,
+ };
+
+ return rpcb_create_af_local(net, &rpcb_localaddr_abstract);
+}
+
+static int rpcb_create_local_unix(struct net *net)
+{
+ static const struct sockaddr_un rpcb_localaddr_unix = {
+ .sun_family = AF_LOCAL,
+ .sun_path = RPCBIND_SOCK_PATHNAME,
+ };
+
+ return rpcb_create_af_local(net, &rpcb_localaddr_unix);
+}
+
/*
* Returns zero on success, otherwise a negative errno value
* is returned.
@@ -332,7 +354,8 @@ int rpcb_create_local(struct net *net)
if (rpcb_get_local(net))
goto out;

- if (rpcb_create_local_unix(net) != 0)
+ if (rpcb_create_local_abstract(net) != 0 &&
+ rpcb_create_local_unix(net) != 0)
result = rpcb_create_local_net(net);

out:



2023-05-23 12:23:04

by Petr Vorel

[permalink] [raw]
Subject: Re: [PATCH 0/2] Support abstract address for rpcbind in kernel

Hi Neil,

> These two patches cause the SUNRPC layer in Linux to attempt to contact
> rpcbind using an AF_UNIX socket with an abstract address before
> the existing attempts of AF_UNIX to a socket in the filesystem, and IP
> to a well known port.

> This allows the benefits of an AF_UNIX connection combined with the
> benefits of honouring the network namespace when connection rpcbind.

> For this to be useful, rpcbind must listed on that name, and user-space
> tools must also connect to the same address. This requires changes to
> rpcbind and too libtirpc. libtirpc currently has a bug which causes
> sockets bountd to abstract addresses to appear to be unbound, so asking
> systemd to pass rpcbind an abstract socket doesn't work - rpcbind
> rejects it.

> Patches for rpcbind and libtirpc will follow.

Thanks a lot for taking care. I finally find a time to test it.
I tested all your patchsets on openSUSE with kernel 6.3.1 (built locally),
rpcbind [2] and libtirpc [3], but although all patches LGTM, there is some
failure:

PATH="/opt/ltp/testcases/bin:$PATH" nfslock01.sh -v 3 -t tcp
nfslock01 1 TINFO: IPv6 disabled on lhost via kernel command line or not compiled in
nfslock01 1 TINFO: initialize 'lhost' 'ltp_ns_veth2' interface
nfslock01 1 TINFO: add local addr 10.0.0.2/24
nfslock01 1 TINFO: initialize 'rhost' 'ltp_ns_veth1' interface
nfslock01 1 TINFO: add remote addr 10.0.0.1/24
nfslock01 1 TINFO: Network config (local -- remote):
nfslock01 1 TINFO: ltp_ns_veth2 -- ltp_ns_veth1
nfslock01 1 TINFO: 10.0.0.2/24 -- 10.0.0.1/24
nfslock01 1 TINFO: fd00:1:1:1::2/64/ -- fd00:1:1:1::1/64/
tst_device.c:96: TINFO: Found free device 0 '/dev/loop0'
tst_supported_fs_types.c:157: TINFO: Skipping ext2 as requested by the test
tst_supported_fs_types.c:157: TINFO: Skipping ext3 as requested by the test
tst_supported_fs_types.c:90: TINFO: Kernel supports ext4
tst_supported_fs_types.c:55: TINFO: mkfs.ext4 does exist
tst_supported_fs_types.c:90: TINFO: Kernel supports xfs
tst_supported_fs_types.c:55: TINFO: mkfs.xfs does exist
tst_supported_fs_types.c:90: TINFO: Kernel supports btrfs
tst_supported_fs_types.c:55: TINFO: mkfs.btrfs does exist
tst_supported_fs_types.c:157: TINFO: Skipping vfat as requested by the test
tst_supported_fs_types.c:157: TINFO: Skipping exfat as requested by the test
tst_supported_fs_types.c:157: TINFO: Skipping ntfs as requested by the test
tst_supported_fs_types.c:157: TINFO: Skipping tmpfs as requested by the test
nfslock01 1 TINFO: === Testing on ext4 ===
nfslock01 1 TINFO: Formatting ext4 with opts='/dev/loop0'
nfslock01 1 TINFO: YES TST_FS_TYPE: 'ext4'
nfslock01 1 TINFO: Mounting device: mount -t ext4 /dev/loop0 /tmp/LTP_nfslock01.pLrRsUDH2Y/mntpoint -o i_version
nfslock01 1 TINFO: timeout per run is 0h 5m 0s
nfslock01 1 TINFO: mount.nfs: (linux nfs-utils 2.6.3)
nfslock01 1 TINFO: setup NFSv3, socket type tcp
nfslock01 1 TINFO: Mounting NFS: mount -v -t nfs -o proto=tcp,vers=3 10.0.0.2:/tmp/LTP_nfslock01.pLrRsUDH2Y/mntpoint/3/tcp /tmp/LTP_nfslock01.pLrRsUDH2Y/3/0
mount.nfs: trying 10.0.0.2 prog 100003 vers 3 prot TCP port 2049
mount.nfs: portmap query failed: RPC: Program not registered
mount.nfs: trying 10.0.0.2 prog 100003 vers 3 prot TCP port 2049
mount.nfs: portmap query failed: RPC: Program not registered
mount.nfs: trying 10.0.0.2 prog 100003 vers 3 prot TCP port 2049
mount.nfs: portmap query failed: RPC: Program not registered
mount.nfs: requested NFS version or transport protocol is not supported for /tmp/LTP_nfslock01.pLrRsUDH2Y/3/0
=> pvorel: ERROR above
mount.nfs: timeout set for Tue May 23 07:49:10 2023
mount.nfs: trying text-based options 'proto=tcp,vers=3,addr=10.0.0.2'
mount.nfs: prog 100003, trying vers=3, prot=6
mount.nfs: prog 100005, trying vers=3, prot=6
mount.nfs: trying text-based options 'proto=tcp,vers=3,addr=10.0.0.2'
mount.nfs: prog 100003, trying vers=3, prot=6
mount.nfs: prog 100005, trying vers=3, prot=6
mount.nfs: trying text-based options 'proto=tcp,vers=3,addr=10.0.0.2'
mount.nfs: prog 100003, trying vers=3, prot=6
mount.nfs: prog 100005, trying vers=3, prot=6
nfslock01 1 TBROK: mount command failed
nfslock01 1 TINFO: Cleaning up testcase
nfslock01 1 TINFO: AppArmor enabled, this may affect test results
nfslock01 1 TINFO: it can be disabled with TST_DISABLE_APPARMOR=1 (requires super/root)
nfslock01 1 TINFO: loaded AppArmor profiles: none

Summary:
passed 0
failed 0
broken 1
skipped 0
warnings 0

I retest it also on single filesystem other than ext4:
PATH="/opt/ltp/testcases/bin:$PATH" LTP_SINGLE_FS_TYPE=btrfs nfslock01.sh -v 3 -t tcp
PATH="/opt/ltp/testcases/bin:$PATH" LTP_SINGLE_FS_TYPE=xfs nfslock01.sh -v 3 -t tcp
But the result is the same: "mount command failed".

BTW even other tests fail:
PATH="/opt/ltp/testcases/bin:$PATH" LTP_SINGLE_FS_TYPE=btrfs nfs01.sh -t tcp

I also downloaded LTP to slightly older code, when only single filesystem was used
(before 9e61bb028), but obviously mount still fails.

Therefore I tested just mount on loop device with the default NFSv4, which works:
dd if=/dev/zero of=/tmp/dev bs=1M count=500
losetup /dev/loop0 /tmp/dev
mkfs.ext2 /dev/loop0
mkdir -p /export
mount /dev/loop0 /export
exportfs -o no_root_squash,async,no_subtree_check,rw localhost:/export
mkdir -p /import
mount localhost:/export /import

df | grep /import
localhost:/export nfs4 467M 0 442M 0% /import

mount | grep /import
localhost:/export on /import type nfs4 (rw,relatime,vers=4.2,rsize=262144,wsize=262144,namlen=255,hard,proto=tcp6,timeo=600,retrans=2,sec=sys,clientaddr=::1,local_lock=none,addr=::1)

But testing NFSv3 does not work (nothing interesting in dmesg):
umount /import
mount -o proto=tcp,vers=3 localhost:/export /import

Obviously, kernel 6.2.12 with the same NFS config with unmodified libtirpc and rpcbind works:

localhost:/export on /import type nfs (rw,relatime,vers=3,rsize=262144,wsize=262144,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,mountaddr=127.0.0.1,mountvers=3,mountport=20048,mountproto=tcp,local_lock=none,addr=127.0.0.1)

I double checked if I backported everything correctly, thus I expect there is
some problem with the code.

Kind regards,
Petr

[1] https://build.opensuse.org/package/binaries/home:pevik:branches:network/rpcbind/openSUSE_Tumbleweed
[2] https://build.opensuse.org/package/show/home:pevik:branches:Base:System/libtirpc

> NeilBrown


> ---

> NeilBrown (2):
> SUNRPC: support abstract unix socket addresses
> SUNRPC: attempt to reach rpcbind with an abstract socket name


> net/sunrpc/clnt.c | 8 ++++++--
> net/sunrpc/rpcb_clnt.c | 39 +++++++++++++++++++++++++++++++--------
> net/sunrpc/xprtsock.c | 9 +++++++--
> 3 files changed, 44 insertions(+), 12 deletions(-)