2021-11-17 11:13:36

by Joakim Zhang

[permalink] [raw]
Subject: NFS break if unplug one when connect two cables


Hi community guys,

I encountered a NFS issue at my side, which bothering me much, could you help me if you have some ideas? That would be appreciated if you have some insights. :-)

---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Environment:
1) A board with two ethernet controllers, eth0 is FEC, eth1 is STMMAC.
2) plugged two cables in the same router.
3) NFS is Yocto.

The kernel boot log, we can see that NFS mount via eth0.
[ 8.860717] Sending DHCP requests ., OK
[ 9.009096] IP-Config: Got DHCP answer from 10.193.102.254, my address is 10.193.102.150
[ 9.017377] IP-Config: Complete:
[ 9.020737] device=eth0, hwaddr=00:04:9f:06:e2:97, ipaddr=10.193.102.150, mask=255.255.255.0, gw=10.193.102.254
[ 9.032123] host=10.193.102.150, domain=ap.freescale.net, nis-domain=(none)
[ 9.039614] bootserver=0.0.0.0, rootserver=10.193.108.176, rootpath=
[ 9.039625] nameserver0=165.114.89.4, nameserver1=134.27.184.42

After NFS mounted, I dump the below info:
~# cat /proc/cmdline
console=ttymxc1,115200 root=/dev/nfs ip=dhcp nfsroot=10.193.108.176:/home/nfsroot/imx8mpevk,v3,tcp

~# ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
link/ether 00:04:9f:06:e2:97 brd ff:ff:ff:ff:ff:ff
inet 10.193.102.150/24 brd 10.193.102.255 scope global eth0
valid_lft forever preferred_lft forever
inet6 fe80::204:9fff:fe06:e297/64 scope link
valid_lft forever preferred_lft forever
3: eth1: <BROADCAST,MULTICAST,DYNAMIC,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
link/ether 00:04:9f:06:e2:98 brd ff:ff:ff:ff:ff:ff
inet 10.193.102.85/24 brd 10.193.102.255 scope global eth1
valid_lft forever preferred_lft forever
inet6 fe80::204:9fff:fe06:e298/64 scope link
valid_lft forever preferred_lft forever
4: can0: <NOARP,ECHO> mtu 16 qdisc noop state DOWN group default qlen 10
link/can

~# ip route
default via 10.193.102.254 dev eth0
10.193.102.0/24 dev eth0 proto kernel scope link src 10.193.102.150
10.193.102.0/24 dev eth1 proto kernel scope link src 10.193.102.85
10.193.102.254 dev eth1 scope link
134.27.184.42 via 10.193.102.254 dev eth1
165.114.89.4 via 10.193.102.254 dev eth1

The issue is:
1) unplug the eth1 then re-plug, NFS can't turn back.
2) unplug the eth0 then re-plug, NFS can turn back.
My queston is that why unplug the eth1 would break the NFS? Is it a NFS limitation?

----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
To figure out it, I tried below two approaches.

1) As you can see, eth0 and eth1 are in the same subnet, may this lead to this issue? Firstly I connect eth1 to another board. After NFS mounted,
all is well, but minitues later, the NFS break, I supposed the reason is eth1 get a local ip which belong to another subnet. So what I can conclude
here is that, NFS still can work if eth0 and eth1 are in the same subnet, meanwhile I didn't unplug eth1, but for different subnets, NFS can't work directly.
So this issue is not related to the same subnet.

2) I thought it should not be a kernel issue, may caused by userspace, so I disable the network service.
~# systemctl disable systemd-networkd.service
~# systemctl disable connman.service
I repeated the operation, after NFS mounted, only eth0 is up since I disable the network service, so I manually up the eth1 and use udhcpc to get the ip.

~# ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
link/ether 00:04:9f:06:e2:97 brd ff:ff:ff:ff:ff:ff
inet 10.193.102.53/24 brd 10.193.102.255 scope global eth0
valid_lft forever preferred_lft forever
inet6 fe80::204:9fff:fe06:e297/64 scope link
valid_lft forever preferred_lft forever
3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
link/ether 00:04:9f:06:e2:98 brd ff:ff:ff:ff:ff:ff
inet 10.193.102.85/24 brd 10.193.102.255 scope global eth1
valid_lft forever preferred_lft forever
inet6 fe80::204:9fff:fe06:e298/64 scope link
valid_lft forever preferred_lft forever

~# ip route
default via 10.193.102.254 dev eth0
default via 10.193.102.254 dev eth1 metric 10
10.193.102.0/24 dev eth0 proto kernel scope link src 10.193.102.53
10.193.102.0/24 dev eth1 proto kernel scope link src 10.193.102.85

Now I unplug the eth1, the NFS still can work. For this experiment, this issue seems related to network service.
What I suspected is that route changed by network service when unplug the eth1 or the original route has some limitation?
I compare the route with and w/o network service enabled.

Network service enabled:
~# ip route
default via 10.193.102.254 dev eth0
10.193.102.0/24 dev eth0 proto kernel scope link src 10.193.102.150
10.193.102.0/24 dev eth1 proto kernel scope link src 10.193.102.85
10.193.102.254 dev eth1 scope link
134.27.184.42 via 10.193.102.254 dev eth1
165.114.89.4 via 10.193.102.254 dev eth1

Network service disabled:
~# ip route
default via 10.193.102.254 dev eth0
default via 10.193.102.254 dev eth1 metric 10
10.193.102.0/24 dev eth0 proto kernel scope link src 10.193.102.53
10.193.102.0/24 dev eth1 proto kernel scope link src 10.193.102.85

Why only one route item towards "default" when network service enabled, but there are two items when network service disabled?

I have little knowledge about this scope, do any one know more?

Best Regards,
Joakim Zhang