Hi all,
I'm not subscribed to the list so please CC me on replies. I'm seeing
some odd behaviour and I'm not sure what's going on.
The short version is that I can mount NFS exports from this server
(including mounting on the server itself) via TCP but not via UDP.
rpcinfo on the server gives the following:
root@controller-0:/var/home/sysadmin# rpcinfo -s
program version(s) netid(s) service owner
100000 2,3,4 local,udp,tcp,udp6,tcp6 portmapper superuser
100024 1 tcp6,udp6,tcp,udp status 116
100003 4,3 udp6,tcp6,udp,tcp nfs superuser
100227 3 udp6,tcp6,udp,tcp - superuser
100021 4,3,1 tcp6,udp6,tcp,udp nlockmgr superuser
100005 3,2,1 tcp6,udp6,tcp,udp mountd superuser
root@controller-0:/var/home/sysadmin# rpcinfo -p
program vers proto port service
100000 4 tcp 111 portmapper
100000 3 tcp 111 portmapper
100000 2 tcp 111 portmapper
100000 4 udp 111 portmapper
100000 3 udp 111 portmapper
100000 2 udp 111 portmapper
100024 1 udp 44993 status
100024 1 tcp 60561 status
100003 3 tcp 2049 nfs
100003 4 tcp 2049 nfs
100227 3 tcp 2049
100003 3 udp 2049 nfs
100227 3 udp 2049
100021 1 udp 36993 nlockmgr
100021 3 udp 36993 nlockmgr
100021 4 udp 36993 nlockmgr
100021 1 tcp 33239 nlockmgr
100021 3 tcp 33239 nlockmgr
100021 4 tcp 33239 nlockmgr
100005 1 udp 43636 mountd
100005 1 tcp 49357 mountd
100005 2 udp 39783 mountd
100005 2 tcp 54527 mountd
100005 3 udp 40970 mountd
100005 3 tcp 41761 mountd
So it *looks* okay to me, but if I probe it I get the expected response
for TCP but not for UDP:
root@controller-0:/var/home/sysadmin# rpcinfo -T tcp localhost 100003 4
program 100003 version 4 ready and waiting
root@controller-0:/var/home/sysadmin# rpcinfo -T tcp localhost 100003 3
program 100003 version 3 ready and waiting
root@controller-0:/var/home/sysadmin# rpcinfo -T udp localhost 100003 3
rpcinfo: RPC: Timed out
program 100003 version 3 is not available
Oddly the last command returns immediately so it's not actually timing
out, the error message is a bit misleading. If I sniff the network
traffic while running that last command, I see this:
root@controller-0:/var/home/sysadmin# tcpdump -vvv -i lo port 2049
tcpdump: listening on lo, link-type EN10MB (Ethernet), snapshot length
262144 bytes
20:20:53.159743 IP (tos 0x0, ttl 64, id 58251, offset 0, flags [DF],
proto UDP (17), length 68)
localhost.887 > localhost.nfs: NFS request xid 1717614110 40 null
20:20:53.160016 IP (tos 0x0, ttl 64, id 58252, offset 0, flags [DF],
proto UDP (17), length 28)
localhost.nfs > localhost.887: [bad udp cksum 0xfe1b -> 0xf663!]
UDP, length 0
Ignoring the checksums, I get this:
root@controller-0:/var/home/sysadmin# tcpdump -vvv -K -i lo port 2049
tcpdump: listening on lo, link-type EN10MB (Ethernet), snapshot length
262144 bytes
20:21:49.642588 IP (tos 0x0, ttl 64, id 22026, offset 0, flags [DF],
proto UDP (17), length 68)
localhost.934 > localhost.nfs: NFS request xid 1718297876 40 null
20:21:49.642697 IP (tos 0x0, ttl 64, id 22027, offset 0, flags [DF],
proto UDP (17), length 28)
localhost.nfs > localhost.934: UDP, length 0
On a separate system with a 5.10 kernel I get a successful response that
is longer and is properly decoded:
root@controller-0:/var/home/sysadmin# tcpdump -vvv -K -i lo port 2049
tcpdump: listening on lo, link-type EN10MB (Ethernet), snapshot length
262144 bytes
22:22:00.388497 IP (tos 0x0, ttl 64, id 806, offset 0, flags [DF], proto
UDP (17), length 68)
localhost.808 > localhost.nfs: NFS request xid 1718037578 40 null
22:22:00.388525 IP (tos 0x0, ttl 64, id 807, offset 0, flags [DF], proto
UDP (17), length 52)
localhost.nfs > localhost.808: NFS reply xid 1718037578 reply ok 24
null
Anyone have any ideas what might be going on or how to debug? I'm
building a kernel with CONFIG_SUNRPC_DEBUG enabled to see if that gives
anything useful.
Thanks,
Chris
> On Jun 13, 2024, at 6:44 PM, Chris Friesen <[email protected]> wrote:
>
> Hi all,
>
> I'm not subscribed to the list so please CC me on replies. I'm seeing some odd behaviour and I'm not sure what's going on.
>
> The short version is that I can mount NFS exports from this server (including mounting on the server itself) via TCP but not via UDP.
>
> rpcinfo on the server gives the following:
>
> root@controller-0:/var/home/sysadmin# rpcinfo -s
> program version(s) netid(s) service owner
> 100000 2,3,4 local,udp,tcp,udp6,tcp6 portmapper superuser
> 100024 1 tcp6,udp6,tcp,udp status 116
> 100003 4,3 udp6,tcp6,udp,tcp nfs superuser
> 100227 3 udp6,tcp6,udp,tcp - superuser
> 100021 4,3,1 tcp6,udp6,tcp,udp nlockmgr superuser
> 100005 3,2,1 tcp6,udp6,tcp,udp mountd superuser
>
> root@controller-0:/var/home/sysadmin# rpcinfo -p
> program vers proto port service
> 100000 4 tcp 111 portmapper
> 100000 3 tcp 111 portmapper
> 100000 2 tcp 111 portmapper
> 100000 4 udp 111 portmapper
> 100000 3 udp 111 portmapper
> 100000 2 udp 111 portmapper
> 100024 1 udp 44993 status
> 100024 1 tcp 60561 status
> 100003 3 tcp 2049 nfs
> 100003 4 tcp 2049 nfs
> 100227 3 tcp 2049
> 100003 3 udp 2049 nfs
> 100227 3 udp 2049
> 100021 1 udp 36993 nlockmgr
> 100021 3 udp 36993 nlockmgr
> 100021 4 udp 36993 nlockmgr
> 100021 1 tcp 33239 nlockmgr
> 100021 3 tcp 33239 nlockmgr
> 100021 4 tcp 33239 nlockmgr
> 100005 1 udp 43636 mountd
> 100005 1 tcp 49357 mountd
> 100005 2 udp 39783 mountd
> 100005 2 tcp 54527 mountd
> 100005 3 udp 40970 mountd
> 100005 3 tcp 41761 mountd
>
>
> So it *looks* okay to me, but if I probe it I get the expected response for TCP but not for UDP:
>
> root@controller-0:/var/home/sysadmin# rpcinfo -T tcp localhost 100003 4
> program 100003 version 4 ready and waiting
> root@controller-0:/var/home/sysadmin# rpcinfo -T tcp localhost 100003 3
> program 100003 version 3 ready and waiting
> root@controller-0:/var/home/sysadmin# rpcinfo -T udp localhost 100003 3
> rpcinfo: RPC: Timed out
> program 100003 version 3 is not available
>
>
> Oddly the last command returns immediately so it's not actually timing out, the error message is a bit misleading. If I sniff the network traffic while running that last command, I see this:
>
> root@controller-0:/var/home/sysadmin# tcpdump -vvv -i lo port 2049
> tcpdump: listening on lo, link-type EN10MB (Ethernet), snapshot length 262144 bytes
> 20:20:53.159743 IP (tos 0x0, ttl 64, id 58251, offset 0, flags [DF], proto UDP (17), length 68)
> localhost.887 > localhost.nfs: NFS request xid 1717614110 40 null
> 20:20:53.160016 IP (tos 0x0, ttl 64, id 58252, offset 0, flags [DF], proto UDP (17), length 28)
> localhost.nfs > localhost.887: [bad udp cksum 0xfe1b -> 0xf663!] UDP, length 0
>
>
> Ignoring the checksums, I get this:
>
> root@controller-0:/var/home/sysadmin# tcpdump -vvv -K -i lo port 2049
> tcpdump: listening on lo, link-type EN10MB (Ethernet), snapshot length 262144 bytes
> 20:21:49.642588 IP (tos 0x0, ttl 64, id 22026, offset 0, flags [DF], proto UDP (17), length 68)
> localhost.934 > localhost.nfs: NFS request xid 1718297876 40 null
> 20:21:49.642697 IP (tos 0x0, ttl 64, id 22027, offset 0, flags [DF], proto UDP (17), length 28)
> localhost.nfs > localhost.934: UDP, length 0
>
>
> On a separate system with a 5.10 kernel I get a successful response that is longer and is properly decoded:
>
> root@controller-0:/var/home/sysadmin# tcpdump -vvv -K -i lo port 2049
> tcpdump: listening on lo, link-type EN10MB (Ethernet), snapshot length 262144 bytes
> 22:22:00.388497 IP (tos 0x0, ttl 64, id 806, offset 0, flags [DF], proto UDP (17), length 68)
> localhost.808 > localhost.nfs: NFS request xid 1718037578 40 null
> 22:22:00.388525 IP (tos 0x0, ttl 64, id 807, offset 0, flags [DF], proto UDP (17), length 52)
> localhost.nfs > localhost.808: NFS reply xid 1718037578 reply ok 24 null
>
>
> Anyone have any ideas what might be going on or how to debug? I'm building a kernel with CONFIG_SUNRPC_DEBUG enabled to see if that gives anything useful.
linux-6.6.15 has "SUNRPC: use request size to initialize bio_vec in
svc_udp_sendto()" which might address this issue.
--
Chuck Lever