Hi Linux NFS Mailing List,
Please excuse any incorrect terminology - I'm not an expert in this
space but am learning :).
I have recently been experimenting with NFS re-exporting in the 5.11
kernel and have come across an interesting quirk when re-exporting a
directory that has subdirectories on different filesystems. I'll do my
best to give a concise overview of my setup and the issue I am seeing
below.
I'll use the following terms to describe the components from here on out:
Source NFS Filer: Refers to the originating NFS Server that contains
the exports that we want to re-export (Ubuntu 20.04.2 /
5.4.0-1038-gcp).
Re-exporting NFS Filer: Refers to the NFS Server that is mounting and
re-exporting the Source NFS Filer (Ubuntu 20.10 /
5.11.0-051100-generic).
NFS Client: Refers to the server that is mounting the re-exported
directories on the Re-exporting NFS Filer (Ubuntu 18.04 /
5.4.0-1036-gcp).
(Source NFS Filer <-- Re-exporting NFS Filer <-- NFS Client)
---
Source NFS Filer:
The source NFS Filer consists of two directories that are each mounted
to filesystems on different disks:
root@demo-nfs-filer-kernel-community:/home/benmaynard# lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
sdb 8:16 0 100G 0 disk /files
sdc 8:32 0 100G 0 disk /files/disk2
The /files directory is then exported from the source filer, with the
crossmnt option set so that clients can move to the child filesystems:
root@demo-nfs-filer-kernel-community:/files# cat /etc/exports
/files 10.0.0.0/8(rw,wdelay,no_root_squash,no_subtree_check,fsid=10,sec=sys,rw,secure,no_root_squash,no_all_squash,crossmnt)
--
Re-exporting NFS Filer:
From the Re-exporting NFS Filer, I can then mount this exported directory:
root@reexport-server:/home/benmaynard# mount -t nfs -o vers=3,sync
10.67.10.54:/files /files
I am able to successfully list the contents of both the /files and
/files/disk2 directories:
root@reexport-server:/home/benmaynard# ls /files/
a_file disk2 lost+found
root@reexport-server:/home/benmaynard# ls /files/disk2/
b_file lost+found
Now time to re-export (using the same options as the Source NFS Filer
including crossmnt):
root@reexport-server:/home/benmaynard# cat /etc/exports
/files 10.0.0.0/8(rw,wdelay,no_root_squash,no_subtree_check,fsid=10,sec=sys,rw,secure,no_root_squash,no_all_squash,crossmnt)
--
NFS Client:
I now want to mount the re-exported directory from my NFS Client:
root@client-vm:/home/benmaynard# mount -t nfs -o vers=3,sync
10.67.10.51:/files /files
The mount completes successfully, but when listing the contents of the
directory I receive a Input/output error for the directory that is on
a different filesystem:
root@client-vm:/home/benmaynard# ls /files
ls: cannot access '/files/disk2': Input/output error
a_file disk2 lost+found
root@client-vm:/home/benmaynard# ls /files/disk2
ls: cannot access '/files/disk2': Input/output error
If I unmount and mount the Source NFS Filer directly, I am able to
read all of the files. The problem seems to be caused by the
re-export.
In my efforts to resolve this I tried a number of different options,
and the only way I was able to get things working was to explicitly
re-export the directories that are on different filesystems from the
Re-Exporting NFS Filer (Source NFS Filer exports and NFS Client mount
command remains unchanged):
root@reexport-server:/home/benmaynard# cat /etc/exports
/files 10.0.0.0/8(rw,wdelay,no_root_squash,no_subtree_check,fsid=10,sec=sys,rw,secure,no_root_squash,no_all_squash,crossmnt)
/files/disk2 10.0.0.0/8(rw,wdelay,no_root_squash,no_subtree_check,fsid=20,sec=sys,rw,secure,no_root_squash,no_all_squash,crossmnt)
With these export options I can successfully mount the Re-exporting
NFS filer and browse both directores:
root@client-vm:/home/benmaynard# mount -t nfs -o vers=3,sync
10.67.10.51:/files /files
root@client-vm:/home/benmaynard# ls /files
a_file disk2 lost+found
root@client-vm:/home/benmaynard# ls /files/disk2/
b_file lost+found
It appears as if the issue is something to do with the re-export, but
I am struggling to track down the actual cause.
This is causing some challenges as the source NFS filer that I am
using in production has 1000's of subdirectories on different volumes.
Manually re-exporting each of these from our re-export server is a
challenge.
I have a lab environment configured so happy to do some more testing
or run commands if it helps with troubleshooting.
On Tue, Mar 23, 2021 at 08:08:38PM +0000, Benjamin Maynard wrote:
> Hi Linux NFS Mailing List,
>
> Please excuse any incorrect terminology - I'm not an expert in this
> space but am learning :).
>
> I have recently been experimenting with NFS re-exporting in the 5.11
> kernel and have come across an interesting quirk when re-exporting a
> directory that has subdirectories on different filesystems. I'll do my
> best to give a concise overview of my setup and the issue I am seeing
> below.
I bet you're encountering a special case of the rule that the "fsid="
option is required on any export of an NFS filesystem.
The "fsid=" option on the parent export doesn't help us with any
children found underneath that export, different filesystems have to
have different fsids.
So, you probably hit the first -EINVAL in fs/nfsd/export.c
check_export(). You could confirm that by turning on some debugging on
the re-exporting server (rpcdebug -m nfsd -s export) and checking the
logs, which should get "exp_export: export of non-dev fs without fsid"
when you hit the IO error.
The re-export server wants to encode into each filehandle something that
identifies the specific filesystem being exported. Otherwise it's stuck
when it gets a filehandle back from the client--the operation it uses to
map the incoming filehandle to a dentry can't work without a superblock.
In theory, if it can at least determine that the filehandle is for an
object on an NFS filesystem, and figure out which server the
filesystem's from, it could (given some new interface) ask the NFS
client to work out the rest.... I've got only vague ideas here and no
real plan for a fix.
So, for now at least, this just doesn't work.
--b.
>
> I'll use the following terms to describe the components from here on out:
>
> Source NFS Filer: Refers to the originating NFS Server that contains
> the exports that we want to re-export (Ubuntu 20.04.2 /
> 5.4.0-1038-gcp).
> Re-exporting NFS Filer: Refers to the NFS Server that is mounting and
> re-exporting the Source NFS Filer (Ubuntu 20.10 /
> 5.11.0-051100-generic).
> NFS Client: Refers to the server that is mounting the re-exported
> directories on the Re-exporting NFS Filer (Ubuntu 18.04 /
> 5.4.0-1036-gcp).
>
> (Source NFS Filer <-- Re-exporting NFS Filer <-- NFS Client)
>
> ---
> Source NFS Filer:
>
> The source NFS Filer consists of two directories that are each mounted
> to filesystems on different disks:
>
> root@demo-nfs-filer-kernel-community:/home/benmaynard# lsblk
> NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
> sdb 8:16 0 100G 0 disk /files
> sdc 8:32 0 100G 0 disk /files/disk2
>
> The /files directory is then exported from the source filer, with the
> crossmnt option set so that clients can move to the child filesystems:
>
> root@demo-nfs-filer-kernel-community:/files# cat /etc/exports
> /files 10.0.0.0/8(rw,wdelay,no_root_squash,no_subtree_check,fsid=10,sec=sys,rw,secure,no_root_squash,no_all_squash,crossmnt)
>
> --
> Re-exporting NFS Filer:
>
> >From the Re-exporting NFS Filer, I can then mount this exported directory:
>
> root@reexport-server:/home/benmaynard# mount -t nfs -o vers=3,sync
> 10.67.10.54:/files /files
>
> I am able to successfully list the contents of both the /files and
> /files/disk2 directories:
>
> root@reexport-server:/home/benmaynard# ls /files/
> a_file disk2 lost+found
>
> root@reexport-server:/home/benmaynard# ls /files/disk2/
> b_file lost+found
>
> Now time to re-export (using the same options as the Source NFS Filer
> including crossmnt):
>
> root@reexport-server:/home/benmaynard# cat /etc/exports
> /files 10.0.0.0/8(rw,wdelay,no_root_squash,no_subtree_check,fsid=10,sec=sys,rw,secure,no_root_squash,no_all_squash,crossmnt)
>
> --
> NFS Client:
>
> I now want to mount the re-exported directory from my NFS Client:
> root@client-vm:/home/benmaynard# mount -t nfs -o vers=3,sync
> 10.67.10.51:/files /files
>
> The mount completes successfully, but when listing the contents of the
> directory I receive a Input/output error for the directory that is on
> a different filesystem:
>
> root@client-vm:/home/benmaynard# ls /files
> ls: cannot access '/files/disk2': Input/output error
> a_file disk2 lost+found
>
>
> root@client-vm:/home/benmaynard# ls /files/disk2
> ls: cannot access '/files/disk2': Input/output error
>
> If I unmount and mount the Source NFS Filer directly, I am able to
> read all of the files. The problem seems to be caused by the
> re-export.
>
> In my efforts to resolve this I tried a number of different options,
> and the only way I was able to get things working was to explicitly
> re-export the directories that are on different filesystems from the
> Re-Exporting NFS Filer (Source NFS Filer exports and NFS Client mount
> command remains unchanged):
>
> root@reexport-server:/home/benmaynard# cat /etc/exports
> /files 10.0.0.0/8(rw,wdelay,no_root_squash,no_subtree_check,fsid=10,sec=sys,rw,secure,no_root_squash,no_all_squash,crossmnt)
> /files/disk2 10.0.0.0/8(rw,wdelay,no_root_squash,no_subtree_check,fsid=20,sec=sys,rw,secure,no_root_squash,no_all_squash,crossmnt)
>
> With these export options I can successfully mount the Re-exporting
> NFS filer and browse both directores:
>
> root@client-vm:/home/benmaynard# mount -t nfs -o vers=3,sync
> 10.67.10.51:/files /files
> root@client-vm:/home/benmaynard# ls /files
> a_file disk2 lost+found
> root@client-vm:/home/benmaynard# ls /files/disk2/
> b_file lost+found
>
> It appears as if the issue is something to do with the re-export, but
> I am struggling to track down the actual cause.
>
> This is causing some challenges as the source NFS filer that I am
> using in production has 1000's of subdirectories on different volumes.
> Manually re-exporting each of these from our re-export server is a
> challenge.
>
> I have a lab environment configured so happy to do some more testing
> or run commands if it helps with troubleshooting.
Thanks for the insight. That looks to be the case.
With debugging turned on the following log entry appears on the
re-exporting server: "Cannot export /files/disk2, possibly unsupported
filesystem or fsid= required".
Understood that there are no plans for a fix in the near future, just
wanted to put it on the radar. Will figure out a workaround in the
meantime.
One of my pie-in-the-sky ideas here is to allow running the server in a
mode where it re-exports one server and has no other exports whatsoever.
That would mean that when the re-exporting server gets a filehandle, it
knows it originated from that one server, and if necessary it can do
things like issue a getattr on that filehandle for the fsid to find out
which filesystem it belongs to.
It's a major restriction, but I figure that might be what you're doing
anyway in a case like yours. Perhaps containers or VMs would be a
workaround if you want to server more exports from the same physical
hardware.
But, anyway, I'm not actually sure how to make that work and it's not
currently on the todo list.
--b.